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The present invention pertains to the field of biology, more particularly the subject of 
the present invention is the identification of a nucleotide sequence which make it possible in 
particular to distinguish an infection resulting from Mycobacterium tuberculosis from an 
infection resulting from Mycobacterium africamim, Mycobacterium canetti, Mycobacterium 
microti, Mycobacterium bovis, Mycobacterium bovis BCG. The subject of the present 
invention is also a method for detecting the sequences in question by the products of 
expression of these sequences and the kits for carrying out these methods. Finally, the 
subject of the present invention is novel vaccines. 

Despite more than a century of research since the discovery of Mycobacterium 
tuberculosis, the aetiological agent of tuberculosis, this disease remains one of the major 
causes of human mortality. M tuberculosis is expected to kill 3 million people annually 
(Snider, 1989 Rev. Inf. Dis. S335) and the number of new people getting infected each >ear 
is rising and is estimated at 8.8 million. Although die majority of these are in developing 
countries, the disease is assuming renewed importance in the western countries due to die 
increasing number of homeless people, the impact of the AIDS epidemic, the chain -.g 

gfobal migration, and die travel patterns. 

Early tuberculosis often goes unrecognized in an otherwise healthy individual. 
Classical initial methods of diagnosis include examination of a sputum smear under a 
mkroscope for acid-fast mycobacteria and an x-ray of the lungs. However, in a vast majority 
of cases the sputum smear examination is negative for Mycobacteria in the early stages of 
lire disease, and lung changes may not be obvious on an x-ray until several months following 
infection. Another complicating factor is that acid-fast bacteria in a sputum smear may often 
be other species of mycobacteria. Antibiotics used for treating tuberculosis have 
considerable side effects, and must be taken as a combination of three or more drugs for a six 
to twelve month period. In addition, the possibility of inducing the appearance of drug 
assistant tuberculosis prevents therapy from being administered without solid evidence to 
-support the diagnosis. Currently the only absolutely reliable method of diagnosis is based on 
citoring M. tuberculosis from die clinical specimen and identifying it morphologically and 
SsBfaemically. This usually takes anywhere from three to six weeks, during which time a 
pafrmt may become seriously ill and infect other individuals. Therefore, a rapid test capable 
o'fisfiably detecting the presence of M. tuberculosis is vital for the early detection and 
treament. Several molecular tests have been developed recently for die rapid detection and 
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identification of M. tuberculosis, such as the Gen-Probe "Amplified Mycobacterium 
tuberculosis Direct Test 35 ; this test amplifies M tuberculosis 16S ribosomal RNA from 
, respiratory specimens and uses a chemiluminescent probe to detect the amplified product 
with a reported sensitivity of about 91%. The discovery of the IS6110 insertion element 
5 (Cave et al., Eisenach et a/.,1990 J. Infectious Diseases 161:977-981; Thierry et ah 1990 J. 
Clin. Microbiol. 28: 2668-2673) and the belief that this element may only be present in 
Mycobacterium complex (M tuberculosis, M.bovis, M.bovis-BCG, M africanum, M.canettii 
and M. microti) spawned a whole series of rapid diagnostic strategies (Brisson-Noel et al. 9 
1991 Lancet 338: 364-366; Clarridge et al 1993, J. Clin. Microbiol. 31:2049-2056; 
10 Cormican et al. 1992 J. Clin. Pathology 1992, 45 : 601-604 ; Cousins et al., 1992 J. Clin. 
Microbiol. 30: 255-258; Del Portillo et al. 1991 J. Clin. Microbiol. 29: 2163-2168; 
Folgueira et al., 1994 Neurology 44:1336-1338 ; Forbes et al. 1993, J.Clin.Microbiol. 

31 : 1688-1694 ; Hermans et al. 1990 J. Clin. Microbiol. 28 : 1204-1213 ; Kaltwasser et al. 
1993 Mol. Cell. Probes 7 : 465-470 ; Kocagoz et al. 1993 J. Clin. Microbiol. 31 : 1435-1438 ; 

15 Kolk et al. 1992 J.Clin.Microbiol. 30 : 2567-2575 ; Kox et al. 1994 J.Clin.Microbiol. 

32 :672-678 ; Liu et al. 1994 Neurology 44 :1 161-1 164 ; Miller et al. 1994 J. Clin.Microbiol. 
32 : 393-397 ; Reischl et al. 1994 Biotechniques 17 :844-845 ; Schluger et al. 1994 Chest 
105:1116-1121; Shawar et al. 1993 J. Clin. Microbiol. 31: 61-65; Wilson et al 1993 
J.Clin.Microbiol. 28: 2668-2673). These tests employ various techniques to extract DNA 

20 from the sputum. PCR \% used to amplify IS61 10 DNA sequences from the extracted DNA. 
The successful amplification of this DNA is considered to be an indicator of the presence of 
M. tuberculosis infection. U.S. Pat. Nos. 5,168,039 and 5,370,998 have been issued to 
Crawford et al. for the IS6110 based detection of tuberculosis. European patent EP 
0,461,045 has been issued to Guesdon for the IS61 10 based detection of tuberculosis. 

25 Thus, these molecular assays used to detect M. tuberculosis depend on the IS6110 

insertion sequence (about 10 copies) or the 16S ribosomal RNA (thousands of copies). 
However, these methods do not provide any information regarding the sub-type of the 
mycobacteria. Indeed several dozen species of Mycobacteria are known, and most are non- 
pathogenic for humans; tuberculosis is usually caused by infection due to M tuberculosis, 

30 with a few cases being caused by M. bovis, Mcanettii, and M. qfricanum. In order to choose 
an appropriate treatment and to conduct epidemiological investigations it is absolutely 
necessary to be able to rapidly and accurately identify isolates, i.e to distinguish the sub-type 
of mycobacteria of the Mycobacterium complex, originating from potential tuberculosis 
patients. That's the problem the present invention intends to solve. 
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The present invention provides an isolated or purified nucleic acid from 
Mycobacterium complex wherein said nucleic acid is selected from the group consisting of: 

a) SEQ ID N° 1 , named TbD 1 region ; 

b) Nucleic acid having a sequence fully complementary to SEQ ID N°l. 

5 c) Nucleic acid fragment comprising at least 8, 12, 15, 20, 25, 30, 50, 100, 250, 

500, 750, 1000, 1500, 2000, 2500, 3000 consecutive nucleotides of SEQ ID 
N°l; 

d) Nucleic acid having at least 90% sequence identity after optimal alignment 
with a sequence defined in a) or b); 
10 e) Nucleic acid that hybridizes under stringent conditions with the nucleic acid 

defined in a) or b); 

As used herein, the terms « isolated » and « purified » according to the invention 
refer to a level of purity that is achievable using current technology. The molecules of the 
invention do not need to be absolutely pure (i.e., contain absolutely no molecules of other 

15 cellular macromolecules), but should be sufficiently pure so that one of ordinary skill in the 
art would recognize that the3' are no longer present in the environment in which they were 
originally found (i.e., the cellular middle). Thus, a purified or isolated molecule according to 
the present invention is one that have been removed from at least one other macromolecule 
present in the natural environment in which it was found. More preferably, the molecules of 

20 the invention are essentially purified and/or isolated, which means that the composition in 
which they are present is almost completely, or even absolutely, free of other 
macromolecules found in the environment in which the molecules of the invention are 
originally found. Isolation and purification thus does not occur by addition or removal of 
salts, solvents, or elements of the periodic table, but must include the removal of at least 

25 some macromolecules. The nucleic acids encompassed by the invention are purified and/or 
isolated by any appropriate technique known to the ordinary artisan. Such techniques are 
widely known, commonly practiced, and well within the skill of the ordinary artisan. As used 
herein, the term " nucleic acid" refers to a polynucleotide sequence such as a single or 
double stranded DNA sequence, RNA sequence, cDNA sequence; such a polynucleotide 

30 sequence has been isolated, purified or synthesized and may be constituted with natural or 
non natural nucleotides. In a preferred embodiment the DNA molecule of the invention is a 
double stranded DNA molecule. As used herein, the terms "nucleic acid", "oligonucleotide", 
"polynucleotide" have the same meaning and are used indifferently. 

By the term "Mycobacterium complex" as used herein, it is meant the complex of 

35 mycobacteria causing tuberculosis which are Mycobacterium tuberculosis , Mycobacterium 
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bovis, Mycobacterium africanum, Mycobacterium microti, Mycobacterium canettii and the 
vaccine strain Mycobacterium bovis BCG. 

The present invention encompasses not only the entire sequence SEQ ID N°l, its 
complement, and its double-stranded form, but any fragment of this sequence, its 

5 complement, and its double-stranded form. 

In embodiments, the fragment of SEQ ID N°l comprises at least approximately 8 
nucleotides. For example, the fragment can be between approximately 8 and 30 nucleotides 
and can be designed as a primer for polynucleotide synthesis. In another preferred 
embodiment, the fragment of SEQ ED N°l comprises between approximately 1,500 and 

10 approximate^' 2,500 nucleotides, and more preferably 2153 nucleotides corresponding to 
SEQ ID N°4 (see figure 5). As used herein, "nucleotides" is used in reference to the number 
of nucleotides on a single-stranded nucleic acid. However, the term also encompasses 
double-stranded molecules. Thus, a fragment comprising 2,153 nucleotides according to the 
invention is a single-stranded molecule comprising 2,153 nucleotides, and also a double 

15 stranded molecule comprising 2153 base pairs (bp). 

In a preferred embodiment, the nucleic acid fragment of the invention is specifically 
deleted in the genome of Mycobacterium tuberculosis, excepted in Mycobacterium 
tuberculosis strains having the sequence CTG at codon 463 of gene katG and having no or 
very few IS6110 sequences inserted in their, genome and present in the genome of 

20 Mycobacterium qfi'icanum, Mycobacterium canettii, Mycobacterium microti, Mycobacterium 
bovis, Mycobacterium bovis BCG. By the term "few IS6110 sequences inserted in the 
genome", it is meant less than ten copies in the genome of M tuberculosis, more preferably 
less than 5 copies, for example less than two copies. 

The nucleic acid fragment of the invention is preferably selected from the group 

25 consisting of: 

a) SEQ ID N°4; 

b) Nucleic acid having a sequence fully complementary to SEQ ID N°4. 

c) Nucleic acid fragment comprising at least 8, 12, 15, 20, 25, 30, 50, 100, 250, 500, 
750, 1000, 1500, 2000, 2500, 3000 consecutive nucleotides of SEQ ID N°4; 

30 d) Nucleic acid having at least 90% sequence identity after optimal alignment with a 

sequence defined in a) or b); 

e) Nucleic acid that hybridizes under stringent conditions with the nucleic acid defined 

in a) or b). 

In embodiments, the stringent conditions under which a sequence according to the 
35 invention is determined are conditions which are no less stringent than 5X SSPE, 2X 
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Denhardt's solution, and 0.5% (w/v) sodium dodecyl sulfate at 65°C. More stringent 
conditions can be utilized by the ordinary artisan, and the proper conditions for a given assay 
can be easily and rapidly determined without undue or excessive experimentation. As an 
illustrative embodiment, the stringent hybridization conditions used in order to specifically 
5 detect a polynucleotide according to the present invention are advantageously the following: 
pre-hybridization and hybridization are performed at 65°C in a mixture containing: 

- 5X SSPE (IX SSPE is 3 M NaCl, 30 mM tri-sodium citrate) 

- 2X Denhardt's solution 

- 0.5% (w/v) sodium dodecyl sulfate (SDS) 
10 - 100 \xg ml" 1 salmon sperm DNA. 

The washings are performed as follows: 

- two washings at laboratory temperature (approximately 21-25°C) for 10 min. in 

the presence of 2X SSPE and 0.1% SDS; and 

- one washing at 65°C for 15 min. in the presence of IX SSPE and 0.1% SDS. 



15 



20 



The invention also encompasses the isolated or purified nucleic acid of the invention 
wherein said nucleic acid comprises at least a deletion of a nucleic acid fragment as defined 
above. Preferably, such an isolated or purified nucleic acid of the invention is the SEQ ID 
N°21 that corresponds to SEQ ID N°l in which SEQ ID N°4 is deleted (absent). 



Polynucleotides of the invention can be characterized by the percentage of identity 
they show with the sequences disclosed herein. For example, polynucleotides having at least 
90% identity with the polynucleotides of the invention, particularly those sequences of the 
sequence listing, are encompassed by the invention. Preferably, the sequences show at least 

25 90% identity with those of the sequence listing. More preferably, they show at least 92% 
identity, for example 95% or 99% identity. The skilled artisan can identify sequences 
according to the invention through the use of the sequence analysis software BLAST (see for 
example, Coffin et al., eds., * 'Retroviruses", Cold Spring Harbor Laboratory Press, pp. 723- 
755). Percent identity is calculated using the BLAST sequence analysis program suite, 

30 Version 2, available at the NCBI (NIH). All default parameters are used. BLAST (Basic 
Local Alignment Search Tool) is the heuristic search algorithm employed by the programs 
blastp, blastn, blastx, tblastn and tblastx, all of which are available through the BLAST 
analysis software suite at the NCBI. These programs ascribe significance to their findings 
using the statistical methods of Karlin and Altschul (1990, 1993) with a few enhancements. 
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Using this publicly available sequence analysis program suite, the skilled artisan can easily 
identify polynucleotides according to the present invention. 

It is well within the skill of the ordinary artisan to identify regions of the nucleic acid 
sequence of the invention, which would be useful as a probe, primer, or other experimental, 
5 diagnostic, or therapeutic aid. For example, the ordinary artisan could utilize any of the 
widely available sequence analysis programs to select regions (fragments) of these sequences 
that are useful for hybridization assays such as Southern blots, Northern blots, DNA binding 
assays, and/or i?7 vitro, in situ, or in vivo hybridizations. Additionally, the ordinary artisan, 
with the sequences of the present invention, can utilize widely available sequence analysis 

10 programs to identify regions that can be used as probes and primers, as well as for design of 
anti-sense molecules. The only practical limitation on the fragment chosen by the ordinary 
artisan is the ability of the fragment to be useful for the purpose for which it is chosen. For 
example, if the ordinary artisan wished to choose a hybridization probe, he would know how 
to choose one of sufficient length, and of sufficient stability, to give meaningful results. The 

15 conditions chosen would be those typically used in hybridization assays developed for 
nucleic acid fragments of the approximate chosen length. 

_) 

Thus, the present invention provides short oligonucleotides, such as those useful as 
probes and primers. In embodiments, the probe and/or primer comprises 8 to 30 consecutive 
nucleotides of the polynucleotide according to the invention or the polynucleotide 

20 complementary thereto. Advantageously, a fragment as defined herein has a length of at least 
8 nucleotides, which is approximately the minimal length that has been determined to allow 
specific hybridization. Preferably the nucleic fragment has a length of at least 12 nucleotides 
and more preferably 20 consecutive nucleotides of any of SEQ ID N°l or SEQ ID N°4. The 
sequence of the oligonucleotide can be any of the many possible sequences according to the 

25 invention. Preferably, the sequence is selected from the following group SEQ ID N° 13, SEQ 
ID N° 14, SEQ ED N°15, SEQ ID N°16, SEQ ID N°17, SEQ ID N°18. More precisely, the 
primers SEQ ID N?13, SEQ ID N°14, SEQ ID N°15 and SEQ ID N°16 are contained in the 
nucleic acid fragment SEQ ID N°4. The primers SEQ ID N°17 and SEQ ID N°18 are 
contained in the nucleic acid sequence SEQ ID N°l and are flanking the nucleic acid 

30 fragment of SEQ ID N°4 (see figure 5). 

Thus, the polynucleotides of SEQ ID N°l and SEQ ID N°4, and their fragments, can 
be used to select nucleotide primers, notably for an amplification reaction, such as the 
amplification reactions further described. 

PCR is described in US Patent No. 4,683,202, which is incorporated in its entirety 

35 herein. The amplified fragments may be identified by agarose or polyacrylamide gel 
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electrophoresis, by a capillary electrophoresis, or alternatively by a chromatography 
technique (gel filtration, hydrophobic chromatography, or ion exchange chromatography). 
The specificity of the amplification can be ensured by a molecular hybridization using as 
nucleic probes the polynucleotides of SEQ ID N°l or SEQ ID N°4, and their fragments, 
5 oligonucleotides that are complementary to these polynucleotides or fragments thereof, or 
their amplification products themselves, and/or even by DNA sequencing. 

The following other techniques related to nucleic acid amplification may also be 
used and are generally preferred to the PCR technique. The Strand Displacement 
Amplification (SDA) technique is an isothermal amplification technique based on the ability 
10 of a restriction en2yme to cleave one of the strands at a recognition site (which is under a 
hemiphosphorothioate form) and on the property of a DNA polymerase to initiate the 
synthesis of a new strand from the 3'OH end generated by the restriction enzyme and on the 
property of this DNA polymerase to displace the previously synthesized strand being 
localized downstream. The SDA amplification technique is more easily performed than PCR 
15 (a single thermostatted water bath device is necessary), and is faster than the other 
amplification methods. Thus, the present invention also comprises using the nucleic acid 
fragments according to the invention (primers) in a method of DNA or RNA amplification 

according to the SDA technique. 

When the target polynucleotide to be detected is a RNA, for example a mRNA, a 
20 reverse transcriptase enzyme will be used before the amplification reaction in order to obtain 
a cDNA from the RNA contained in the biological sample. The generated cDNA is 
subsequently used as the nucleic acid target for the primers or the probes used in an 
amplification process or a detection process according to the present invention. 

The non-labeled polynucleotides or oligonucleotides of the invention can be directly 
25 used as probes. Nevertheless, the polynucleotides or oligonucleotides are generally labeled 
with a radioactive element ( 32 P, 35 S, 3 H 5 125 I) or by a non-isotopic molecule (for example, 
biotin, acetylaminofluorene, digoxigenin, 5-bromodesoxyuridine, fluorescein) in order to 
generate probes that are useful for numerous applications. Examples of non-radioactive 
labeling of nucleic acid fragments are described in French patent N° FR 78 10975 and by 
30 Urdea et ah (1988, Nucleic Acids Research 11:4937-4957) or Sanchez-Pescador et ah (1988, 
j m Clin. Microbiol 26(10):1934-1938), the disclosures of which are hereby incorporated in 
their entirety. Other labeling techniques can also be used, such as those described in French 
patents FR 2 422 956 and FR 2 518 755. The hybridization step may be performed in 
different ways. See, for example, Matthews et ah, 1988, Anal Biochem. 169:1-25. A general 
35 method comprises immobilizing the nucleic acid that has been extracted from the biological 
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sample on a substrate (for example, nitrocellulose, nylon, polystyrene) and then incubating, 
in defined conditions, the target nucleic acid with the probe. Subsequent to the hybridization 
step, the excess amount of the specific probe is discarded and the hybrid molecules formed 
are detected by an appropriate method (radioactivity, fluorescence or enzyme activity 
5 measurement, etc.). 

Amplified nucleotide fragments are useful, among other tilings, as probes used in 
hybridization reactions in order to detect the presence of one polynucleotide according to the 
present invention or in order to detect mutations. The primers may also be used as 
oligonucleotide probes to specifically detect a polynucleotide according to the invention. 

10 The oligonucleotide probes according to the present invention may also be used in a 

detection device comprising a matrix library of probes immobilized on a substrate, the 
sequence of each probe of a given length being localized in a shift of one or several bases, 
one from the other, each probe of the matrix library thus being complementary to a distinct 
sequence of the target nucleic acid. Optionally, the substrate of the matrix may be a material 

15 able to act as an electron donor, the detection of the matrix positions in which an 
hybridization has occurred being subsequently determined by an electronic device. Such 
matrix libraries of probes and methods of specific detection of a target nucleic acid is 
described in the European patent application N° EP-0 713 016 (Affymax technologies) and 
also in the US patent N° US-5,202,231 (Drmanac). Since almost the whole length of a 

20 mycobacterial chromosome is covered by B AC-based genomic DNA library (i.e. 97% of the M 
tuberculosis chromosome is covered by the BAC library 1-1945), these DNA libraries will play 
an important role in a plurality of post-genomic applications, such as in mycobacterial gene 
expression studies where the canonical set of BACs could be used as a matrix for hybridization 
studies. Thus it is also in the scope of the invention to provide a nucleic acid chips, more 

25 precisely a DNA chips or a protein chips that respectively comprises a nucleic acid or a 
polypeptide of the invention. 

The present invention is also providing a vector comprising the isolated DNA 
molecule of the invention. A 'Vector " is a replicon in which another polynucleotide segment 
is attached, so as to bring the replication and/or expression to the attached segment. A vector 

30 can have one or more restriction endonuclease recognition sites at which the DNA sequences 
can be cut in a determinable fashion without loss of an essential biological function of the 
vector, and into which a DNA fragment can be spliced in order to bring about its replication 
and cloning. Vectors can further provide primer sites (e.g. for PCR), transcriptional and/or 
translational initiation and/or regulation sites, recombinational signals, replicons, selectable 

35 markers, etc. Beside the use of homologous recombination or restriction en2ymes to insert a 
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desired DNA fragment into the vector, UDG cloning of PCR fragments (US Pat. No. 
5,334,575), T:A cloning, and the like can also be applied. The cloning vector can further 
contain a selectable marker suitable for use in the identification of cells transformed with the 
cloning vector. 

5 The vector can be any useful vector known to the ordinary artisan, including, but not 

limited to, a cloning vector, an insertion vector, or an expression vector. Examples of vectors 
include plasmids, phages, cosmids, phagemid, yeast artificial chromosome (YAC), bacterial 
artificial chromosome (BAC), human artificial chromosome (HAC), viral vector, such as 
adenoviral vector, retroviral vector, and other DNA sequences which are able to replicate or 

10 to be replicated in vitro or in a host cell, or to convey a desired DNA segment to a desired 
location witliin a host cell. 

According to a preferred embodiment of the invention, the recombinant vector is a BAC 
pBeloBACll in which the genomic region of Mycobacterium bovis-BCG 1173P3 that spans 
the region corresponding to the locus 1,760,753 bp to 1,830,364 bp in the genome of M. 
15 tuberculosis H37Rv has been inserted into the HindlH restriction site; mis recombinant 
vector is named X229. In this region, the inventors have demonstrated the deletion of a 2153 
bp fragment, corresponding to SEQ ID N°4, in the vast majority of M. tuberculosis strains 
excepted strains of M tuberculosis having the sequence CTG at codon 463 of gene katG and 
having no or very few IS6110 sequences inserted in their genome. That's the reason why the 
20 inventors named this deletion of 2153 bp TbDl ("M tuberculosis specific deletion 1"). 
TbDl is flanked by the' sequence GGC CTG GTC AAA CGC GGC TGG ATG CTG and 
AGA TCC GTC TTT GAC ACG ATC GAC G. External primers hybridizing with such 
sequences outside TbDl or the complementary sequences thereof can be used for the 
amplification of TbDl to check for the presence or the absence of the deletion of the TbDl. 
25 The inventors design for example the following primers: 

5'- CTA CCT CAT CTT CCG GTC CA-3' (SEQ ID N°17) 
5'- CAT AGA TCC CGG ACA TGG TG-3'(SEQ ID N°18) 
In order to get a specific 500 pb probe for hybridization experiments, a PCR amplification of 
a fragment comprised in TbDl may be realized by using the plasmid X229 as a matrix. The 
30 amplification of a fragment of approximatively 500 bp contained in TbDl can be performed 
by using the following primers: 

5'- CGT TCA ACC CCA AAC AGG TA-3' (SEQ ID N°13) 
5'- AAT CGA ACT CGT GGA ACA CC-3' (SEQ ID N°14) 
The amplification of a fragment of approximatively 2,000 bp contained in TbDl can be 
35 performed by using the following primers: 
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5'- ATT CAG CGT CTA TCG GTT GC-3' (SEQ ED N°15) 

5'- AGC AGC TCG GGA TAT CGT AG-3 5 (SEQ ID N°16) 
The PCR conditions are the following: denaturation 95°C 1 min, then 35 cycles of 
amplification [95°C during 30 seconds, 58°C during 1 min] , then elongation 72°C during 4 

* 

5 min. 

Thus, this invention also concerns a recombinant cell host which contains a 
polynucleotide or recombinant vector according to the invention. The cell host can be 
transformed or transfected with a polynucleotide or recombinant vector to provide transient, 
stable, or controlled expression of the desired polynucleotide. For example, the 

10 polynucleotide of interest can be subcloned into an expression plasmid at a cloning site 
downstream from a promoter in the plasmid and the plasmid can be introduced into a host 
cell where expression can occur. The recombinant host cell can be any suitable host known 
to the skilled artisan, such as a eukaryotic cell or a microorganism. For example, the host can 
be a cell selected from the group consisting of Escherichia coli, Bacillus subtilis, insect cells, 

1 5 and yeasts. According to a preferred embodiment of the invention, the recombinant cell host 
is a commercially available Escherichia coli DH10B (Gibco) containing the BAC named 
X229 previously described. This Escherichia coli DH10B (Gibco) containing the BAC 
named X229 has been deposited with the Collection Nationale de Cultures de 
Microorganismes (CNCM), Institut Pasteur, Paris, France, on February 18 th , 2002 under 

20 number CNCM 1-2799. 

Another aspect of the invention is the product of expression of all or part of the 
nucleic acid according to the invention, including the nucleic acid fragment specifically 
deleted in the genome of Mycobacterium tuberculosis, excepted in Mycobacterium 
tuberculosis strains having the sequence CTG at codon 463 of gene katG and having no or 

25 very few IS6110 sequences inserted in their genome as defined previously. The expression 
"product of expression" is understood to mean any isolated or purified protein, polypeptide 
or polypeptide fragment resulting from the expression of all or part of the above-mentioned 
nucleotide sequences. Among those product of expression, one can cite the membrane 
protein mmpL6 corresponding to SEQ ID N°6, the membrane protein mmpS6 corresponding 

30 to SEQ ID N°3 or SEQ ID N°10 (the two sequences SEQ ID N°3 and SEQ ID N° 10 are 
identical), and their truncated or rearranged forms due to the deletion of a nucleic acid 
fragment according to the invention. For example, SEQ ID N°8 is a truncated form of 
mmpL6 protein, SEQ ID N°12 is a truncated form of mmpS6 protein and SEQ ID N°22 is a 
fusion product [mmpS6-mmpL6] of both rearranged mmpL6 and mmpS6 proteins. 

35 
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It is now easy to produce proteins in large amounts by genetic engineering 
techniques through the use of expression vectors, such as plasmids, phages, and phagemids. 
The polypeptide of the present invention can be produced by insertion of the appropriate 
polynucleotide into an appropriate expression vector at the appropriate position within the 
5 vector. Such manipulation of polynucleotides is well known and widely practiced by the 
ordinary artisan. The polypeptide can be produced from these recombinant vectors either in 
vitro or in vivo. All the isolated or purified nucleic acids encoding the polypeptide of the 
invention are in the scope of the invention. The polypeptide of the invention is a polypeptide 
encoded by a polynucleotide which hybridizes to any of SEQ ID N°l or N°4 under stringent 

10 conditions, as defined herein. 

More preferably, said isolated or purified nucleic acid according the invention is selected 

among: 

- the mmpL6 gene of sequence SEQ ID N°5 contained in SEQ ID N°l and encoding 
the mmpL6 protein of sequence SEQ ID N°6; 

15 - the truncated form of mmpL6 gene of sequence SEQ ID N°7 contained in TbDl of 

sequence SEQ ID N°4 and encoding a truncated form of mmpL6 protein of sequence SEQ 
ID N°8; 

- the mmpS6 gene of sequence SEQ ID N°9 contained in SEQ ID N°l and encoding 
the mmpS6 protein of SEQ ID N°10; 

20 - the truncated form of mmpS6 gene of sequence SEQ ID N°l 1 contained in TbDl of 

sequence SEQ ID N°4 and encoding a truncated form of mmpS6 protein of SEQ ID 
N°12. 

- the chimeric gene of SEQ ID N°21 issued from fusion of both truncated mmpS6 and 
mmpL6 genes due to the deletion of TbDl in the genome of M tuberculosis excepted 

25 strains of M tuberculosis having the sequence CTG at codon 463 of gene katG and 

having no or very few IS61 10 sequences inserted in their genome. This chimeric gene 
encodes the fusion polypeptide [mmpS6-mmpL6] of sequence SEQ ID N°22. 

The present invention also provides a method for the discriminatory detection and 
30 identification of: 

- Mycobacterium tuberculosis excepted Mycobacterium tuberculosis strains having the 
sequence CTG at codon 463 of gene katG and having no or very few IS6110 sequences 
inserted in their genome; versus, 

- Mycobacterium africanum, Mycobacterium canettii, Mycobacterium microti, 
35 Mycobacterium bovis, Mycobacterium bovis BCG in a biological sample, 
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comprising the following steps: 

a) isolation of the DNA from the biological sample to be analyzed or 
production of a cDNA from the RNA of the biological sample, 

b) detection of the nucleic acid sequences of the mycobacterium present in said 
5 biological sample, 

c) analysis for the presence or the absence of a nucleic acid fragment 
specifically deleted in the genome of Mycobacterium tuberculosis, excepted in 
Mycobacterium tuberculosis strains having the sequence CTG at codon 463 of gene 
katG and having no or very few IS6110 sequences inserted in their genome, as 

10 previously described. 

By a biological sample according to the present invention, it is notably intended a 
biological fluid, such as sputum, saliva, plasma, blood, urine or sperm, or a tissue, such as a 
biopsy. 

Analysis of the desired sequences may, for example, be carried out by agarose gel 

15 electrophoresis. If the presence of a DNA fragment migrating to the expected site is 
observed, it can be concluded that the analyzed sample contained mycobacterial DNA. This 
analysis can also be carried out by the molecular hybridization technique using a nucleic 
probe. This probe will be advantageously labeled with a nonradioactive (cold probe) or 
radioactive element. Advantageously, the detection of the mycobacterial DNA sequences 

20 will be carried out using nucleotide sequences complementary to said DNA sequences. By 
way of example, they may include labeled or nonlabeled nucleotide probes; they may also 
include primers for amplification. The amplification technique used may be PCR but also 
other alternative techniques such as the SDA (Strand Displacement Amplification) 
technique, the TAS technique (Transcription-based Amplification System), the NASBA 

25 (Nucleic Acid Sequence Based Amplification) technique or the TMA (Transcription 
Mediated Amplification) technique. 

The primers in accordance with the invention have a nucleotide sequence chosen from 
the group comprising SEQ ID N° 13, SEQ ID N° 14, SEQ ID N°15, SEQ ID N°16, SEQ ID 
N°17, SEQ ID N°18. The primers SEQ ID N°13, SEQ ID N°14, SEQ ED N°15 and SEQ ID 

30 N°16 are contained in the nucleic acid fragment SEQ ID N°4, and the primers SEQ ID N°17 
and SEQ ID N° 18 are contained in the nucleic acid of the invention SEQ ID N°l but not in 
the nucleic acid fragment SEQ ID N°4. 

In a variant, the subject of the invention is also a method for the discriminatory 
detection and identification of: 
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- Mycobacterium tuberculosis excepted Mycobacterium tuberculosis strains having the 
sequence CTG at codon 463 of gene katG and having no or very few IS6110 sequences 
inserted in their genome; versus, 

- Mycobacterium afincanwn, Mycobacterium canettii s Mycobacterium microti, 
5 Mycobacterium bovis, Mycobacterium bovis BCG in a biological sample, 

comprising the following steps: 

a) bringing the biological sample to be analyzed into contact with at least one 
pair of primers as defined above, the DNA contained in the sample having been, where 
appropriate, made accessible to the hybridization beforehand, 

10 b) amplification of the DNA of the mycobacterium, 

c) visualization of the amplification of the DNA fragments. 
The amplified fragments may be identified by agarose or polyacrylamide gel 
electrophoresis by capillary electrophoresis or by a chromatographic technique (gel filtration, 
hydrophobic chromatography or ion-exchange chromatography). The specification of the 

15 amplification may be controlled by molecular hybridization using probes, plasmids 
containing these sequences or their product of amplification. The amplified nucleotide 
fragments may be used as reagent in hybridization reactions in order to detect the presence, 
in a biological sample, of a target nucleic acid having sequences complementary to those of 
said amplified nucleotide fragments. These probes and amplicons may be labeled or 

20 otherwise with radioactive elements or with nonradioactive molecules such as enzymes or 
fluorescent elements. 

The subject of the present invention is also a kit for the discriminatory detection and 
identification of: 

- Mycobacterium tuberculosis excepted Mycobacterium tuberculosis strains having the 
25 sequence CTG at codon 463 of gene katG and having no or very few IS6110 sequences 

inserted in their genome; versus, 

- Mycobacterium africcnium, Mycobacterium canettii, Mycobacterium microti 
Mycobacterium bovis, Mycobacterium bovis BCG in a biological sample, 

in a biological sample comprising the following elements: 
30 a) at least one pair of primers as defined previously, 

b) the reagents necessary to carry out a DNA amplification reaction, 

c) optionally, the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment. 

Indeed, in the context of the present invention, depending on the pair of primers 
35 used, it is possible to obtain very different results. Thus, the use of primers which are 
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contained in the TbDl deletion, such as for example SEQ ID N°13, SEQ ID N°14, SEQ ID 
N°15, SEQ ID N°16 5 is such that no amplification product is detectable in M tuberculosis 
excepted in strains having the sequence CTG at codon 463 of gene katG and having no or 
very few IS6110 sequences in their genome, and that amplification product is detectable in 
5 Mycobacterium africanum, Mycobacterium canettii, Mycobacterium microti, Mycobacterium 
bovis, Mycobacterium bovis BCG, Mycobacterium tuberculosis having the sequence CTG at 
codon 463 of gene katG and having no or very few IS6110 sequences inserted in their 
genome. The use of a pair of primers outside the TbDl deletion such as SEQ ID N°17 and 
SEQ ID N°18 is likely to give rise to an amplicon in Mycobacterium africanum, 

10 Mycobacterium canettii, Mycobacterium microti, Mycobacterium bovis, Mycobacterium 
bovis BCG, Mycobacterium tuberculosis having the sequence CTG at codon 463 of gene 
katG and having no or very few IS6110 sequences inserted in their genome, of about 
2100 bp whereas the use of the pair of primers outside the TbDl deletion will give rise in 
M tuberculosis excepted in strains having the sequence CTG at codon 463 of gene katG and 

15 having no or very few IS6110 sequences inserted in their genome, to an amplicon of about 
few bp. 

More generally, the invention pertains to the use of at least one pair of primers as 
defined previously for the amplification of a DNA sequence from Mycobacterium 
tuberculosis or Mycobacterium africanum, Mycobacterium canettii, Mycobacterium microti, 
20 Mycobacterium bovis, Mycobacterium bovis BCG, Mycobacterium tuberculosis having the 
sequence CTG at codon 463 of gene katG and having no or very few IS6110 sequences 
inserted in their genome. 

Indeed, the subject of the present invention is also a method for the in vitro 
25 discriminatory detection of antibodies directed against Mycobacterium tuberculosis excepted 
Mycobacterium tuberculosis having the sequence CTG at codon 463 of gene katG and 
having no or very few IS61 10 sequences inserted in their genome versus antibodies directed 
against Mycobacterium africanum, Mycobacterium canettii, Mycobacterium microti, 
Mycobacterium bovis, Mycobacterium bovis BCG„ hfycobacterium tuberculosis having the 
30 sequence CTG at codon 463 of gene katG and having no or very few IS6\ 10 sequences 
inserted in their genome, in a biological sample, comprising the following steps: 

a) bringing the biological sample into contact with at least one product of 
expression of all or part of the nucleic acid fragment specifically deleted in M. tuberculosis 
excepted in strains of M tuberculosis having the sequence CTG at codon 463 of gene katG 
35 and having no or very few IS61 10 sequences inserted in their genome, as previously defined, 
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b) detecting the antigen-antibody complex formed. 

The subject of the present invention is also a method for the in vitro discriminatory 
detection of a vaccination with Mycobacterium bovis BCG, an infection by M bovis, M. 
canettii, M microti, M. africanum or M. tuberculosis strains having the sequence CTG at 
5 codon 463 of gene katG and having no or very few IS6110 sequences inserted in their 
genome, versus an infection by Mycobacterium tuberculosis, excepted by Mycobacterium 
tuberculosis strains having the sequence CTG at codon 463 of gene katG and having no or 
very few IS61 10 sequences inserted in their genome in a mammal, comprising the following 
steps: 

10 a ) preparation of a biological sample containing cells, more particularly cells of 

the immune system of said mammal and more particularly T cells, 

b) incubation of the biological sample of step a) with at least one product of 
' expression of all or part of the nucleic acid fragment specifically deleted in M tuberculosis 

excepted in strains of M tuberculosis having the sequence CTG at codon 463 of gene katG 
1 5 and having no or very few IS6 1 1 0 sequences inserted in their genome, as previously defined, 

c) detection of a cellular reaction indicating prior sensitization of the mammal to 
said product, in particular cell proliferation and/or synthesis of proteins such as gamma- 
interferon. Cell proliferation may be measured, for example, by incorporating 3 H-Thymidine. 

The invention also relates to a kit for the in vifro discriminatory diagnosis of a 
20 vaccination with M bovis BCG, an infection by M bovis, M canettiU M microti, M 
afi'icanum versus an infection by M tuberculosis excepted by strains having the sequence 
CTG at codon 463 of gene katG and having no or very few IS6110 sequences inserted in 

their genome, in a mammal comprising: 

a) a product of expression of all or part of the nucleic acid fragment specifically 
25 deleted in M tuberculosis excepted in strains of M tuberculosis having the sequence CTG at 

codon 463 of gene katG and having no or very few IS6110 sequences inserted in their 

genome, as previously defined , 

b) where appropriate, the reagents for the constitution of the medium suitable 

for the immunological reaction, 
30 c ) the reagents allowing the detection of the antigen-antibody complexes 

produced by the immunological reaction, 

d) where appropriate, a reference biological sample (negative control) free of 

antibodies recognized by said product, 

e) where appropriate, a reference biological sample (positive control) 
35 containing a predetermined quantity of antibodies recognized by said product. 
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The reagents allowing the detection of the antigen-antibody complexes may carry a marker 
or may be capable of being recognized in turn by a labeled reagent, more particularly in the 
case where the antibody used is not labeled. 

The subject of the invention is also mono- or polyclonal antibodies, their chimeric 

5 fragments or antibodies, capable of specifically recognizing a product of expression in 
accordance with the present invention. 

The present invention therefore also relates to a method for the in vitro 
discriminatory detection of the presence of an antigen of Mycobacterium tuberculosis 
excepted of strains having the sequence CTG at codon 463 of gene katG and having no or 

10 very few IS6110 sequences inserted in their genome, versus the presence of an antigen of 
Mycobacterium qfriccmum, Mycobacterium canettii, Mycobacterium microti t Mycobacterium 
bovis, Mycobacterium bovis-BCG and Mycobacterium tuberculosis having the sequence 
CTG at codon 463 of gene katG and having no or very few IS61 10 sequences inserted in 
their genome, in a biological sample comprising the following steps: 

15 a) bringing the biological sample into contact with an antibody of the invention, 

b) detecting the antigen-antibody complex formed. 
The invention also relates to a kit for the discriminatory detection of the presence of 
an antigen of Mycobacterium tuberculosis excepted strains of M. tuberculosis having the 
sequence CTG at codon 463 of gene katG and having no or very few 756 110 sequences 

20 inserted in their genome versus the presence of an antigen of Mycobacterium qfriccmum, 
Mycobacterium canettii, Jifycobacterium microti, Mycobacterium bovis, Mycobacterium 
bovis BCG, Mycobacterium tuberculosis having the sequence CTG at codon 463 of gene 
katG and having no or very few IS61 10 sequences inserted in their genome, in a biological 
sample comprising the following steps: 

25 a) an antibody as previously claimed , 

b) the reagents for constituting the medium suitable for the immunological reaction, 

c) the reagents allowing the detection of the antigen-antibody complexes produced 
by the immunological reaction. 

The above-mentioned reagents are well known to a person skilled in the art who will 
30 have no difficulty adapting them to the context of the present invention. 

The subject of the invention is also an immunogenic composition, characterized in 
that it comprises at least one product of expression in accordance with the invention. Such an 
immunogenic composition will be used to protect animals and humans against infections by 
M qfricajium^ M. bovis 7 M. canettii, M. microti and M tuberculosis. 
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In a particular embodiment, such an immunogenic composition will comprise a 
product of expression of all or part of the nucleic fragment specifically deleted in the genome 
of Mycobacterium tuberculosis, excepted in Mycobacterium tuberculosis strains having the 
sequence CTG at codon 463 of gene katG and having no or very few IS6110 sequences 
5 inserted in their genome. And in a preferable embodiement, such an immunogenic 
composition will comprise a product of expression of all or part of TbDl. In this case, such 
an immunogenic composition will be used to protect animals and humans against infections 
by M africanum, A£ bovis, M canettiU M. microti and M. tuberculosis strains having the 
sequence CTG at codon 463 of gene katG and having no or very few IS6110 sequences 

1 0 inserted in their genome. 

In an other particular embodiment, such an immunogenic composition will comprise 
the fusion product [mmpS6-mmpL6] of SEQ ID N°22. This fusion product is due to the 
absence of TbDl in M tuberculosis excepted strains having the sequence CTG at codon 463 
of gene katG and having no or very few IS6110 sequences inserted in their genome. An 

15 immunogenic composition comprising this fusion product will be used to protect animals 
and humans specifically against infection by the vast majority of M tuberculosis strains 
excepted strains having the sequence CTG at codon 463 of gene katG and having no or very 
few IS61 10 sequences inserted in their genome. 

Advantageously, the immunogenic composition in accordance with the invention 

20 enters into the composition of a vaccine when it is provided in combination with a 
pharmaceutical^ acceptable vehicle and optionally with one or more immunity adjuvants) 
such as alum or a representative of the family of muramylpeptides or incomplete Freund's 
adjuvant. 

The invention also relates to a vaccine comprising at least one product of expression 
25 in accordance with the invention in combination with a pharmaceutical^ compatible vehicle 
and, where appropriate, one or more appropriate immunity adjuvant(s). 

The invention also provide an in vitro method for the detection and identification of 
Mycobacterium tuberculosis excepted Mycobacterium tuberculosis strains having the 
sequence CTG at codon 463 of gene katG and having no or very few IS6110 sequences 
30 inserted in their genome in a biological sample, 
comprising the following steps: 

a) isolation of the DNA from the biological sample to be analyzed or 
production of a cDNA from the KNA of the biological sample, 

b) detection of the nucleic acid sequences of the mycobacterium present in said 
35 biological sample, 
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c) analysis for the presence or the absence of a nucleic acid fragment of the 
invention. 

In another embodiment, the invention provides an in vitro method for the detection 
and identification of Mycobacterium tuberculosis excepted Mycobacterium tuberculosis 
5 strains having the sequence CTG at codon 463 of gene katG and having no or very few 
IS6110 sequences inserted in their genome in a biological sample, comprising the following 
steps: 

a) bringing the biological sample to be analyzed into contact with at least one pair of 
primers selected among nucleic acid fragments of the invention, and more preferably 

10 selected among the primers chosen from the group comprising SEQ ID N°13, SEQ ID N°14, 
SEQ ID N°15, SEQ ID N°16, SEQ ID N°17, SEQ ID N°18, the DNA contained in the 
sample having been, where appropriate, made accessible to the hybridization beforehand, 

b) amplification of the DNA of the mycobacterium, 

c) visualization of the amplification of the DNA fragments. 

15 The invention also provides a kit for the detection and identification of 

Mycobacterium tuberculosis excepted Mycobacterium tuberculosis strains having the 
sequence CTG at codon 463 of gene katG and having no or very few IS6110 sequences 
inserted in their genome in a biological sample, comprising the following elements: 

a) at least one pair of primers selected among nucleic acid fragments of the 
20 invention, and more preferably selected among the primers chosen from the group 

comprising SEQ ID N°13, SEQ ID N°14, SEQ ID N°15, SEQ ID N°16, SEQ ID N°17, SEQ 
IDN°18, 

b) the reagents necessary to carry out a DNA amplification reaction, 

c) optionally, the necessary components which make it possible to verify or 
25 compare the sequence and/or the size of the amplified fragment. 

The invention also relates to a method for the in vitro detection of antibodies 
directed against Mycobacterium tuberculosis excepted Mycobacterium tuberculosis strains 
having the sequence CTG at codon 463 of gene katG and having no or very few IS6110 
sequences inserted in their genome, in a biological sample, comprising the following steps: 
30 a) bringing the biological sample into contact with at least one product of 

expression of all or part of the nucleic acid fragment specifically deleted in M tuberculosis 
excepted in strains of M tuberculosis having the sequence CTG at codon 463 of gene katG 
and having no or very few IS61 10 sequences inserted in their genome, 
b) detecting the antigen-antibody complex formed. 
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It is also a goal of the invention to use the TbDl deletion as a genetic marker for the 
differentiation of Mycobacterium strains of Mycobacterium complex. 

It is also a goal of the invention to use mmpL6 551 polymorphism as a genetic marker for 
the differentiation of Mycobacterium strains of Mycobacterium complex. 
5 The use of such genetic marker(s) in association with at least one genetic marker 

selected among RD1, RD2, RD3, RD4 5 RD5, RD6, RD7, RD8, RD9, RD10, RD11, RD13, 
RDM, RvDl, RvD2, RvD3, RvD4, RvD5, katG 463 , gyrA 95 , oxyR' 2S5 9 pncA 57 and the specific 
insertion element of M canettii (IS canettii) allows the differentiation of Mycobacterium 
strains of Mycobacterium complex (see example 4). 
10 The present invention provides an in vitro method for the detection and identification 

of Mycobacteria from the Mycobacterium complex in a biological sample, comprising the 
following steps: 

a) analysis for the presence or the absence of a nucleic acid fragment specificalty 
deleted in M. tuberculosis excepted in strains of M. tuberculosis having the 

15 sequence CTG at codon 463 of gene katG and having no or very few IS6110 

sequences inserted in their genome, and 

b) analysis of at least one additional genetic marker selected among RD1, RD2, RD3, 
RD4, RD5, RD6 5 RD7, RD8, RD9, RD10, RD11, RD13, RDM, RvDl, RvD2, 
RvD3, RvD4, RvD5, katG 463 , gyrA 95 , oxyR' 285 , pncA 57 , the specific insertion 

20 element of M canettii. 

In a preferred embodiment, two additional markers are used, preferably RD4 and RD9. 
The analysis is performed by a technique selected among sequence hybridization, nucleic 
acid amplification, antigen-antibody complex. 

It is also a goal of the present invention to provide a kit for the detection and 
25 identification of Mycobacteria from the Mycobacterium complex in a biological sample 
comprising the following elements: 

a) at least one pair of primers selected among nucleic acid fragments of the 
invention, and more preferably selected among the primers chosen from the 
group comprising SEQ ID N° 13, SEQ ID N° 14, SEQ ID N°15, SEQ ID 

30 N°16, SEQIDN 0 17, SEQIDN°18, 

b) at least one pair of primers specific of the genetic markers selected among 
RD1, RD2, RD3, RD4, RD5, RD6, RD7, RD8, RD9, RD10, RD11, RD13, 
RD14, RvDl, RvD2, RvD3, RvD4, RvD5, katG 463 , gyrA 95 , oxyR' 285 , pncA 57 , 
the specific insertion element of M. canettii. 

35 c) the reagents necessary to carry out a DNA amplification reaction, 
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d) optionally, the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment. 

In a preferred embodiment, the kit comprises the following elements: 

a) at least one pair of primers selected among nucleic acid fragments of the 
5 invention, and more preferably selected among the primers chosen from the 

group comprising SEQ DDN°13, SEQ EDN°14, SEQ ID N°15, SEQ ID N°16, 
SEQIDN 0 17, SEQIDN 0 18, 

b) one pair of primers specific of the genetic marker RD4, 

c) one pair of primers specific of the genetic marker RD9, 

10 d) the reagents necessary to carry out a DNA amplification reaction, 

e) optionally, the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment. 



15 The figures and examples presented below are provided as further guide to the 

practitioner of ordinary skill in the art and are not to be construed as limiting the invention in 
anyway. 
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FIGURES 



Figure 1 : Amplicons obtained from strains that have the indicated genomic region present 
or deleted. Sizes of amplicons in each group are uniform. Numbers correspond to strain 
designation used in Kremer et al. (1999, J. Clin Microbiol. 37: 2607-2618) (Ref 8) and 
Supply et al (2001, J. Clin. Microbiol. 39: 3563-3571) (ref.9). 



Figure 2 : Sequences in the TbDl region obtained from strains of various geographic 

f 

regions. 

* refers to groups based on katG^^Z&'rA 095 sequence polymorphism defined by Sreevatsan 
and colleagues (Ref. 2). Numbers correspond to strain designation used in Kremer et al. 
30 (1999, J. Clin Microbiol. 37: 2607-261 8) (Ref. 8) and Supply et al (2001, J. Clin. Microbiol. 
39: 3563-3571) (ref.9). 
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Figure 3 : Spoligotypes of selected M tuberculosis and M bovis strains. Numbers 
correspond to strain designation used in Kremer et al. (1999, J. Clin Microbiol. 37: 2607- 
2618) (Ref. 8) and Supply et al (2001, J. Clin. Microbiol. 39: 3563-3571) (ref.9). 

5 Figure 4 : Scheme of the proposed evolutionary pathway of the tubercle bacilli illustrating 
successive loss of DNA in certain lineages (grey boxes). The scheme is based on presence or 
absence of conserved deleted regions and on sequence polymorphisms in five selected genes. 
Note that the distances between certain branches may not correspond to actual phylogenetic 
differences calculated by other methods. 

10 Dark arrows indicate that strains are characterized by katG 9 * 63 CTG (Leu) 5 gyi'A c95 ACC 
(Thr) 5 typical for group 1 organisms. Arrows with white lines indicate that strains belong to 
group 2 characterized by katG c463 CGG (Arg), gyrA c9s ACC (Thr). The arrow with white 
boxes indicates that strains belong to group 3, charcterized by katG* 63 CGG (Arg), gyi'A c9S 
AGC (Ser), as defined by Sreevatsan and colleagues (Sreevastan et al., 1997 Proc. Natl. 

15 Acad.Sci USA 151: 9869-9874) (Ref. 2). 

Figure 5 : Scheme of die TbDl deletion and surrounding region in Mycobacterium complex. 
A : Scheme of TbDl and surrounding region in genome of M bovis, M. bovis BCG, M 
qfricamim, M. canettii, M microti and ancestral strains of M tuberculosis characterized by 

20 having the sequence CTG at codon 463 of gene katG and having no or very few IS6110 
sequences inserted in their genome. The mmpL6 gene, the mmpS6 gene, the different 
primers, the different nucleic acid fragments and polypeptides coded by them are 
approximately localized in the region. The 2153 pb deletion named TbDl, specifically 
deleted in M tuberculosis excepted in ancestral strains of M tuberculosis, is delimited by its 

25 two end points. 

B : Scheme of TbDl and surrounding region in genome of M. tuberculosis excepted 
ancestral strains of M tuberculosis. Positions of the TbDl deletion and of the nucleic acid of 
sequence SEQ ID N°l in the genome of M tuberculosis strain H37Rv are marked below the 
scheme. An chimeric ORF [??jmpS6-??i7?ipL6] resulting from the absence of TbDl is drawn, 
30 the sequence of this chimeric ORF, SEQ ID N°21 and the sequence of the encoded 
polypeptide, SEQ ID N°22, are approximately localized above the scheme. 

Figure 6 : Sequence of the specific insertion element in genome of Mycobacterium canettii 
strains. The beginning of this insertion element is at position 399 and the end of this insertion 
35 element is at position 2378. This insertion element contains the coding sequence of a 
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putative transposase (sequence in bold characters, from position 517 to position 2307) that 
shows significant homology with a transposase of Mycobacterium smegmatis. This coding 
sequence is framed by two 20 bp inverted repeats (sequences underlined from position 399 
to 418 and from position 2359 to 2378). 



EXAMPLES 



1. MATERIAL AND METHODS: 

10 

1.1. Bacterial Strains : The 100 M tuberculosis complex strains comprised 46 M 
tuberculosis strains isolated in 30 countries, 14 M. qfricanwn strains, 28 M bovis strains 
originating in 5 countries, 2 M. bovis BCG vaccine strains (Pasteur and Japan), 5 M. microti 
strains, and 5 M. canettii strains. The strains were isolated from human and animal sources 

15 and were selected to represent a wide diversity including 60 strains that have been used in a 
multi-center study (8). The M. qfricanwn. strains were retrieved from the collection of the 
Wadsworth Center, New York State Department of Health, Albany, New York, whereas the 
majority of the M bovis isolates came from the collection of the University of Zaragoza, 
Spain. Four M canettii strains are from the culture collection of the Institut Pasteur, Paris, 

20 France. The strains have* been extensively characterized by reference typing methods, i.e. 
IS6J10 restriction fragment length polymorphism (RFLP) typing and spoligotyping. M. 
tuberculosis H37Rv, M tuberculosis H37Ra, M tuberculosis CDC 1551, M. bovis 
AF2 122/97, M microti OV254, and M canettii CIPT 140010059 were included as reference 
strains. DNA was prepared as previously described (10). 

25 

1.2. Genome comparisons and primer design 

For preliminary genome comparisons between M. tuberculosis and M bovis websites 
http://genolist.pasteur.fr/TubercuList/ and http://www.sanger.ac.Uk/Projects/M bovis/ as 
well as inhouse databases were used. For primer design, sequences inside or flanking RD 
30 and RvD regions were obtained from the same websites. Primers were designed using the 
primer 3 website http://www-genome.wi.mit.edu/cgi-bin/primer/primer3 www.cgi that 
would amplify ca. 500 base pair fragments in the reference strains (Table 1). 
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1,3. RP-PCR analysis 

Reactions were performed in 96 well plates and contained per reaction 1.25 \i\ of 10 x PCR 
buffer (600mM Tris HC1 pH 8.8, 20 mM MgCb, 170 mM (NH4)2S04, 100 mM p- 
mercaptoethanol), 1.25 \xl 20mM nucleotide mix, 50 nM of each primer, 1-10 ng of template 
5 DNA 5 10% DMSO, 0.2 units Tag polymerase (Gibco-BRL) and sterile distilled water to 12.5 
jlxI. Thermal cycling was performed on a PTC- 100 amplifier (MJ Inc.) with an initial 
denaturation step of 90 seconds at 95°C, followed by 35 cycles of 30 seconds at 95°C, 1 min 
at 58°C, and 4 min at 72°C. 

10 1.4, Sequencing of junction regions (RDs. TbDl^ katG % svrA. oxvR an d vncA genes 

PCR products were obtained as described above, using primers listed in Table 1 . 
For primer elimination, 6 \i\ PCR product was incubated with 1 unit of Shrimp 
Alkaline phosphatase (USB), 10 units of exonuclease I (USB), and 2 \i\ of 5 x buffer 
(200mM Tris HC1 pH 8.8, 5mM MgCl 2 ) for 15 min at 37°C and then for 15 min at 80°C. To 

15 this reaction mixture 2 \x\ of Big Dye sequencing mix (Applied Biosystems), 2 jxl (2jxM) of 
primer and 3 jllI of 5 x buffer (5mM MgCl 2 , 200mM Tris HC1 pH 8.8) were added and 35 
cycles (96°C for 30 sec; 56°C for 15 sec; 60°C for 4 min) performed in a thermocycler (MJ- 
research Inc., Watertown, MA). DNA was precipitated using 80 \xl of 76% ethanol, 
centrifiiged, rinsed with 70% ethanol, and dried. Reactions were dissolved in 2 jil of 

20 formamide/EDTA buffer, denatured and loaded onto 48 cm, 4 % polyacrylamide gels and 
electrophoresis performed on 377 automated DNA sequencers (Applied Biosystems) for 10 
to 12 h. Alternatively, reactions were dissolved in 0.3 mM EDTA buffer and subjected to 
automated sequencing on a 3700 DNA sequencer (Applied Biosystems). Reactions generally 
gave between 500-700 bp of unambiguous sequence. 

25 

1.5. Accession Numbers 

The sequence of the TbDl region from the ancestral M tuberculosis strain No. 74 
(Ref. 8) containing genes mmpS6 and mmpL6 was deposited in the EMBL database under 
accession No. AJ426486. Sequences bordering RD4, RD7, RD8, RD9 and RD10 in BCG are 
30 available under accession numbers AJ003103, AJ007301, AJ131210, Y18604, and 
AJ132559, respectively. 

2. EXPERIMENTAL DATA: 
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The distribution of 20 variable regions resulting from insertion-deletion events in the 
genomes of the tubercle bacilli has been evaluated in a total of 1 00 strains of Mycobacterium 
tuberculosis, M. africanum, M. canettii, M microti and M. bovis. This approach showed that 
5 the majority of these polymorphisms did not occur independently in the different strains of 
the M tuberculosis complex but, rather, result from ancient, irreversible genetic events in 
common progenitor strains. Based on the presence or absence of an M tuberculosis specific 
deletion (TbDl), M. tuberculosis strains can be divided into ancestral and "modern' 5 strains, 
the latter comprising representatives of major epidemics like the Beijing, Haarlem and 

10 African M. tuberculosis clusters. Furthermore, successive loss of DNA, reflected by RD9 
and other subsequent deletions, was identified for afi evolutionary lineage represented by M 
qfricanum, Ml microti and M. bovis that diverged from the progenitor of the present M 
tuberculosis strains before TbDl occurred. These findings contradict the often-presented 
hypothesis that M. tuberculosis, the etiological agent of human tuberculosis evolved from M. 

15 bovis, the agent of bovine disease. M canettii and ancestral M tuberculosis strains lack 
none of these deleted regions and therefore appear to be direct descendants of tubercle bacilli 
that existed before the M africanum-> M. bovis lineage separated from the M tuberculosis 
lineage. This suggests that the common ancestor of the tubercle bacilli resembled M 
tuberculosis or M canettii and could well have been a human pathogen already. 

20 The mycobacteria grouped in the M. tuberculosis complex are characterized by 

99.9% similarity at the nucleotide level and identical 16S rRNA sequences (1, 2) but differ 
widely in terms of their host tropisms, phenotypes and pathogenicity. Assuming that they are 
all derived from a common ancestor, it is intriguing that some are exclusive human (M. 
tuberculosis, M africanum, M. canettii) or rodent pathogens (M microti) whereas others 

25 have a wide host spectrum (M bovis). What was the genetic organization of the last common 
ancestor of the tubercle bacilli and in which host did it live? Which genetic events may have 
contributed to the fact that the host spectrum is so different and often specific? Where and 
when did M. tuberculosis evolve? Answers to these questions are important for a better 
understanding of the pathogenicity and the global epidemiology of tuberculosis and may 

30 help to anticipate future trends in the spread of the disease. 

Because of the unusually high degree of conservation in their housekeeping genes it 
has been suggested that the members of the M. tuberculosis complex underwent an 
evolutionary bottleneck at the time of speciation, estimated to have occurred roughly 15,000 
- 20,000 years ago (2). It also has been speculated that M. tuberculosis, the most widespread 

35 etiological agent of human tuberculosis has evolved from M bovis, the agent of bovine 
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tuberculosis, by specific adaptation of an animal pathogen to the human host (3). However, 
both hypotheses were proposed before the whole genome sequence of M. tuberculosis (4) 
was available and before comparative genomics uncovered several variable genomic regions 
in the members of the M. tuberculosis complex. Differential hybridization arrays identified 

5 14 regions (RD1 -14) ranging in size from 2 to 12.7 kb that were absent from BCG Pasteur 
relative to M tuberculosis H37Rv (5, 6). In parallel, six regions, RvDl-5, and TbDl, that 
were absent from the M tuberculosis H37Rv genome relative to other members of the M. 
tuberculosis complex were revealed by comparative genomics approaches employing 
pulsed-field gel electrophoresis (PFGE) techniques (5, 7) and in silico comparisons of the 

10 near complete M. bovis AF2122/97 genome sequence and the M. tuberculosis H37Rv 
sequence. 

In the present study the inventors have analyzed the distribution of these 20 variable 
regions situated around the genome (Table 1) in a representative and diverse set of 100 
strains belonging to the M. tuberculosis complex. The strains were isolated from different 

15 hosts, from a broad range of geographic origins, and exhibit a wide spectrum of typing 
characteristics like \S6110 and spoligotype hybridization patterns or variable-number tandem 
repeats of mycobacterial interspersed repetitive units (MIRU-VNTR) (8, 9). The inventors 
have found striking evidence that deletion of certain variable genomic regions did not occur 
independently in the different strains of the Mycobacterium complex and, assuming that 

20 there is little or no recombination of chromosomal segments between the various lineages of 
the complex, this allows the inventors to propose a completely new scenario for the 
evolution of the Mycobacterium complex and the origin of human tuberculosis. 

Variable genomic regions and their occurrence in the members of the M. tuberculosis 
25 complex. 

The PCR screening assay for the 20 variable regions (Table 1) within 46 M 
tuberculosis, 14 M. africanum, 5 M. canettii, 5 M. microti, 28 M. bovis and 2 BCG strains 
employed oligonucleotides internal to known RDs and RvDs, as well as oligonucleotides 
30 flanking these regions (Table 1). This approach generated a large data set that was robust, 
highly reliable, and internally controlled since PCR amplicons obtained with the internal 
primer pair correlated with the absence of an appropriately sized amplicon with the flanking 

primer-pair, and vice-versa. 

According to the conservation of junction sequences flanking the variable regions 
35 three types of regions were distinguished, each having different importance as an 
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evolutionary marker. The first type included mobile genetic elements, like the prophages 
phiRvl (RD3) and phiRv2 (RD11) and insertion sequences 1SJ532 (RD6) and IS6JJ0 
(RD5), whose distribution in the tubercle bacilli was highly divergent (Table 2). The second 
type of deletion is mediated by homologous recombination between adjacent IS6J10 
5 insertion elements resulting in the loss of the intervening DNA segment (RvD2, RvD3, 
RvD4, and RvD5 (7)) and is variable from strain to strain (Table 2). 

The third type includes deletions whose bordering genomic regions typically do not 
contain repetitive sequences. Often this type of deletion occurred in coding regions resulting 
in the truncation of genes that are still intact in other strains of the M tuberculosis complex. 

10 The exact mechanism leading to this type of deletion remains obscure, but possibly rare 
strand slippage errors of DNA polymerase may have contributed to this event. As shown in 
detail below, RD1 5 RD2, RD4 5 RD7, RD8, RD9, RD10, RD12, RD13, RD14, and TbDl are 
representatives of this third group whose distribution among the 100 strains allows us to 
propose an evolutionary scenario for the members of the M. tuberculosis complex, that 

15 identified A£ tuberculosis and/or M canettii as most closely related to the common ancestor 
of the tubercle bacilli. 

2.1. M. tuberculosis strains: 

Investigation of the 46 M tuberculosis strains by deletion analysis revealed that most 

20 RD regions were present in all M tuberculosis strains tested (Table 2). Only regions RD3 
and RD11, corresponding to the two prophages phiRvl and phiRv2 of M tuberculosis 
H37Rv (4), RD6 containing the insertion sequence IS7532, and RD5 that is flanked by a 
copy of IS6110 (5) were absent in some strains. This is an important observation as it implies 
that M. tuberculosis strains are highly conserved with respect to RD1, RD2, RD4, RD7, 

25 RD8, RD9, RD10, RD12, RD13, and RD14, and that these RDs represent regions that can 
differentiate M. tuberculosis strains independent of their geographical origin and their typing 
characteristics from certain other members of the M. tuberculosis complex. Furthermore, this 
suggests that these regions may be involved in the host specificity of M, tuberculosis. 

In contrast, the presence or absence of RvD regions in M tuberculosis strains was 

30 variable. The region which showed the greatest variability was RvD2, since 18 from 46 
tested M tuberculosis strains did not carry the RvD2 region. Strains with a high copy 
number.of IS6110 (>14) missed regions RvD2 to RvD5 more often than strains with only a 
few copies. As an example, all six tested strains belonging to the Beijing cluster (8) lacked 
regions RvD2 and RvD3. This is in agreement with the proposed involvement of 

35 recombination of two adjacent copies of IS6110 in this deletion event (7). 
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However, the most surprising finding concerning the RvD regions was that TbDl was 
absent from 40 of the tested M tuberculosis strains (87 %), including representative strains 
from major epidemics such as the Haarlem, Beijing and Africa clusters (8). To accentuate 
this result we named this region "M. tuberculosis specific deletion 1" (TbDl). In silico 
5 sequence comparison of M tuberculosis H37Rv with the corresponding section in M bovis 
AF2 122/97 revealed that in M. bovis this locus comprises two genes encoding membrane 
proteins belonging to a large family, whereas in M tuberculosis H37Rv one of these genes 
(tm7ipS6) was absent and the second was truncated (mmpLS). Unlike the RvD2-RvD5 
deletions, the TbDl region is not flanked by a copy of IS6J10 in M. tuberculosis H37Rv, 
10 suggesting that insertion elements were not involved in the deletion of the 2153 bp fragment. 
To further investigate whether the 40 M. tuberculosis strains lacking the TbDl region had 
the same genomic organization of this locus as M. tuberculosis H37Rv, we amplified the 
TbDl -junction regions of the various strains by PCR using primers flanking the deleted 
region (Table 1). This approach showed that the size of the amplicons obtained from 
15 multiple strains was uniform (Fig. 1 ) and subsequent sequence analysis of the PCR products 
revealed that in all tested TbDl -deleted strains die sequence of the junction regions was 
identical to that of M. tuberculosis H37Rv (Fig.2). The perfect conservation of the junction 
sequences in TbDl -deleted strains of wide geographical diversity suggests that the genetic 
event which resulted in the deletion occurred in a common progenitor. However, six M. 
20 tuberculosis strains, all characterized by very few or no copies of IS6U 0 and spoligotypes 
mat resembled each other (Fig. 3) still had the TbDl region present. Interestingly, these six 
stains were also clustered together by MIRU-VNTR analysis (9). 

Analysis of partial gene sequences of oxyR, pncA, katG, and &rA which have been 
described as variable between different tubercle bacilli (2, 1 1, 12, 13) revealed that all tested 
25 M. tuberculosis strains showed oxyR and pncA partial sequences typical for M. tuberculosis 
(oxyR - nucleotide 285 (oxyR 2 * 5 )^, pncA - codon 57 (pncA 51 : CAC ). Based on the katG 
codon 463 (farfG 463 ) and gyrA codon 95 (gyrA 95 ) sequence polymorphism, Sreevatsan and 
colleagues (2) defined three groups among the tubercle bacilli, group 1 showing katG 463 
CTG (Leu), gyrA 95 ACC (Thr), group 2 exhibiting katG 463 CGG (Arg), gyi-A 95 ACC (Thr), 
30 and group 3 showing katG 463 CGG (Arg), gyrA 95 AGC (Ser). According to this scheme, in 
our study 16 of the 46 tested M. tuberculosis strains belonged to group 1, whereas 27 strains 
belonged to group 2 and only 3 isolates to group 3. From the 40 strains that were deleted for 
region TbDl, 9 showed characteristics of group 1, including the strains belonging to the 
Beijing cluster, 28 of group 2, including the strains from the Haarlem and Africa clusters and 
35 3 of group 3, including H37Rv and H37Ra. Most interestingly, all six M. tuberculosis strains 
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where the TbDl region was not deleted, contained a leucine (CTG) at fartG 403 , which was 
described as characteristic for ancestral M tuberculosis strains (group 1) (2). As shown in 
Figure 4, this suggests that during the evolution of M. tuberculosis the katG mutation at 
codon 463 CTG (Leu) -> CGG (Arg) occurred in a progenitor strain that had region TbDl 
5 deleted. This proposal is supported by the finding that strains belonging to group 1 may or 
may not have deleted region TbDl, whereas all 30 strains belonging to groups 2 and 3 lacked 
TbDl (Fig. 4). Furthermore, all strains of groups 2 and 3 characteristically lacked spacer 
sequences 33-36 in the direct repeat (DR) region (Fig. 3). It appears that such spacers may be 
lost but not gained (14). Therefore, TbDl deleted strains will be referred to hereafter as 
10 "modern" M. tuberculosis strains. 

2,2. M. canettii: 

M canettii is a very rare smooth variant of M. tuberculosis, isolated usually from 
patients from, or with connection to, Africa. Although it shares identical 16S rRNA 

15 sequences with the other members of the hdycobacterium complex, M. canettii. strains differ 
in many respects including polymorphisms in certain house-keeping genes, IS 108 1 copy 
number, colony morphology, and the lipid content of the cell wall (15, 16). Therefore, we 
were surprised to find that in M. canettii all the RD, RvD, and TbDl regions except the 
prophages (phiRvl, phiRv2) were present. In contrast, we identified a region (RD 0311 ) being 

20 specifically absent from all five M. canettii strains that partially overlapped RD12 (Fig. 4). 

The conservation of the RD, RvD, and TbDl regions in the genome of M. canettii in 
conjunction with the many described and observed differences suggest that M. canettii 
diverged from the common ancestor of the Mycobacterium complex before RD, RvD and 
TbDl occurred in the lineages of tubercle bacilli (Fig. 4). This hypothesis is supported by the 

25 finding that M canettii was shown to carry 26 unique spacer sequences in the direct repeat 
region (14), that are no longer present in any other member of the Mycobacterium complex. 
An other specific feature of M canettii is the presence of an insertion element whose 
sequence has been searched, by using PCR and hybridization approaches, without sucess in 
the other member strains of Mycobacterium complex (including M. tuberculosis, M. bovis, 

30 M qfricanum and M. microti). This insertion element contained an ORF encoding a putative 
transposase framed by two inverted repeats. The sequence of this insertion element is 
represented in figure 6 and in SEQ ID N°19 where it begins at position 399 and ends at 
position 2378. The amino acids sequence of the putative transposase is drawn in SEQ ID 
N°20. As such, this insertion element can be used to differentiate between M. tuberculosis 

35 ancestral strains and M canettii strains, that may show the same TbDl, RD4 and RD9 
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profiles. Therefore, M. canettii represents a fascinating tubercle bacillus, whose detailed 
genomic analysis may reveal further insights into the evolution of Mycobacterium complex. 

23. M. africanum: 

5 The isolates designated as M. africanum studied here originate from West and East- 

African sources. 1 1 strains were isolated in Sierra Leone, Nigeria and Guinea and 2 strains in 
Uganda. One strain comes from the Netherlands. 

For the 1 1 West African isolates, RD analysis indicated that these strains all lack the 
RD9 region containing cobL. Sequence analysis of the RD9 junction region showed that the 

10 genetic organization of this locus in West African strains was identical to that of M. bovis 
and M microti in that the 5' part of cobL as well as the genes Rv2073c and Rv2074c were 
absent. In addition, six strains (2 from Sierra Leone, 4 from Guinea) also lacked RD7, RDS 
and RD10 (Table 2). The junction sequences bordering RD7, RDS and RD10, like those for 
RD9, were identical to those of M. bovis and M microti strains. As regards the two 

15 prophages phiRvl and phiRv2, the West African strains all contained phiRv2, whereas 
phiRvl was absent. No variability was seen for the RvD regions. RvDl-RvD5 and TbDl 
were present in all tested West African strains. This shows that M. africanum prevalent in 
West Africa can be differentiated from "modern" M. tuberculosis by at least two variable 
genetic markers, namely the absence of region RD9 and the presence of region TbDl . 

20 In contrast, for East African M africanum and for the isolate from the Netherlands, 

no genetic marker was found which could differentiate them from M tuberculosis strains. 
With the exception of prophage phiRvl (RD3) the 3 strains from Uganda and the 
Netherlands did not exhibit any of the RD deletions, but lacked the TbDl region, as do 
"modern" M tuberculosis strains. The absence of the TbDl region was also confirmed by 

25 sequence analysis of the TbDl junction region, which was found to be identical to that of 
TbDl deleted M. tuberculosis strains. These results indicate a very close genetic relationship 
of these strains to M. tuberculosis and suggest that they should be regarded as M 
tuberculosis rather than M. africanum strains. 

30 2.4. M microti: 

M. microti strains were isolated in the 1930's from voles (17) and more recently from 
immuno-suppressed patients (18). These strains are characterized by an identical, 
characteristic spoligotype, but differ in their \S6110 profiles. Both, the vole and the human 
isolates, lacked regions RD7, RD8, RD9, and RD 10 as well as a region that is specifically 

35 deleted from M. microti (RD mic ). RD mic was revealed by a detailed comparative genomics 
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study of M. microti isolates (19) using clones from a M. microti Bacterial Artificial 
Chromosome (BAC) library. RD mic partially overlaps RD1 from BCG (data not shown). 
Furthermore, vole isolates missed part of the RD5 region, whereas this region was present in 
the human isolate. As the junction region of RD5 in M microti was different to that in BCG 
5 (data not shown), RD5 was not used as an evolutionary marker. 

2.5. M bovis and M bovis BCG: 

M. bovis has a very large host spectrum infecting many mammalian species, 
including man. The collection of M bovis strains that was screened for the RD and RvD 
regions consisted of 2 BCG strains and 18 "classical" M. bovis strains generally 
characterized by only one or two copies of IS6110 from bovine, llama and human sources in 
addition to three goat isolates, three seal isolates, two oryx isolates, and two M. bovis strains 
from humans that presented a higher number of IS6110 copies. 

Excluding prophages, the distribution of RDs allowed us to differentiate five main 
groups among the tested M. bovis strains. The first group was formed by strains that lack 
RD7, RDS, RD9 ? and RD10. Representatives of this group are three seal isolates and two 
human isolates containing between three and five copies of IS6110 (data not shown). Two 
oryx isolates harboring between 17 and 20 copies of IS 61 10 formed the second group that 
lacked parts of RDS in addition to RD7-RD10, and very closely resembled the M microti 
isolates. However, they did not show RD m,c , the deletion characteristic of M microti strains 
(data not shown). Analysis of partial oxyR and pncA sequences from strains belonging to 
groups one and two, showed sequence polymorphisms characteristic of M tuberculosis 
strains (oxyR 2 * 5 : G.pncA 51 : CAC, Ref. 12, 13). 

25 Group three consists of goat isolates that lack regions RDS, RD7, RDS, RD9, RD10, 

RD12, and RD13. As previously described by Aranaz and colleagues, these strains exhibited 
an adenosine at position 285 of the oxyR pseudogene that is specific for "classical" M. bovis 
strains whereas the sequence of the pncA 51 polymorphism was identical to that in M 
tuberculosis (20). This is in good agreement with our results from sequence analysis (Table 

30 2) and the finding that except for RD4, the goat isolates displayed the same deletions as 
"classical" M bovis strains. Taken together, this suggests that the oxyR 1 * 5 mutation (G — > 
A) occurred in M bovis strains before RD4 was lost. Interestingly, the most common M 
bovis strains ("classical" M. bovis (21)), isolated from cattle from Argentina, the 
Netherlands, the UK and Spain, as well as from humans (e. g. multi-drug resistant M. bovis 

35 from Spain (22)) showed the greatest number of RD deletions and appear to have undergone 
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the greatest loss of DNA relative to other members of the M. tuberculosis complex. These 
lacked regions RD4, RD5 5 RD6, RD7, RD8, RD9, RD10, RD12 and RD13, confirming 
results obtained with reference strains (5, 6). These strains together with the two BCG strains 
were the only ones that showed the pncA 51 polymorphism GAC (Asp) in addition to the 
5 oxytB 1 * 5 mutation (G -> A) characteristic of M bovis. Analysis of BCG strains indicate that 
BCG lacked the same RD regions as "classical" M. bovis strains in addition to RD1, RD2 
and RD14 which apparently occurred during and after the attenuation process (Fig. 4) (6, 
23). 

In contrast to RDs, the RvD regions were highly conserved in the M bovis strains. 

10 With the exception of the two IS5J70-rich oryx isolates, that lacked RvD2, RvD3 and RvD4, 
all other strains had the five RvD regions present. It is particularly noteworthy that TbDl 
was present in all M bovis strains. 

However, except for the two human isolates, containing between three and five copies 
of IS6J10 from group 1, strains designated as M bovis showed a single nucleotide 

15 polymorphism in the TbDl region at codon 551 (AAG) of the mmpL6 gene, relative to M 
ccmettii, M africanum and ancestral M. tuberculosis strains, which are characterized by 
codon AAC. Even the strains isolated from seals and from oryx with oxyR or pncA loci like 
those of M tuberculosis and with fewer deleted regions than the classical M. bovis strains, 
showed the mmpL6 5Sl AAG polymorphism typical for M bovis and M microti (Table 2, Fig. 

20 4). As such, this polymorphism could serve as a very useful genetic marker for the 
differentiation of strains that lack RD7, RD8, RD9, and RD10 and have been classified as M 
bovis or M. africanum, but may differ from other strains of the same taxon. 



3. DISCUSSION 

25 

3.1. Origin of human tuberculosis 

For many years, it was thought that human tuberculosis evolved from the bovine 
disease by adaptation of an animal pathogen to the human host (3). This hypothesis is based 
on the property of M tuberculosis to be almost exclusively a human pathogen, whereas M 

30 bovis has a much broader host range. However, the results from this study unambiguously 
show that M bovis has undergone numerous deletions relative to M tuberculosis. This is 
confirmed by the preliminary analysis of the near complete genome sequence of M. bovis 
AF2122/97, a "classical" M. bovis strain isolated from cattle, which revealed no new gene 
chasters that were confined specifically to M bovis. This indicates that the genome of M 

35 bovis is smaller than that of M. tuberculosis (24). It seems plausible that M bovis is the final 
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member of a separate lineage represented by M qfricanum (RD9), M mia-oti (RD7, RD8, 
RD9 3 RD10) and M 6ov/5 (RD4, RD5, RD7, RDS, RD9, RD10, RD12, RD13) (25) that 
branched from the progenitor of M. tuberculosis isolates. Successive loss of DNA may have 
contributed to clonal expansion and the appearance of more successful pathogens in certain 
5 new hosts. 

Whether the progenitor of extant M tuberculosis strains was already a human 
pathogen when the M qfricanum — » M. bovis lineage separated from the M. tuberculosis 
lineage is a subject for speculation. However, we have two reasons to believe that this was 
the case. Firstly, the six ancestral M. tuberculosis strains (TbDl + , RD9*) (Fig.3) that 

10 resemble the last common ancestor before the separation of M tuberculosis and M 
qfricamnn are all human pathogens. Secondly, M canettii, which probably diverged from the 
common ancestor of today's M tuberculosis strains prior to an}' other known member of the 
M. tuberculosis complex is also a human pathogen. Taken together, this means that those 
tubercle bacilli, which are thought to most closely resemble the progenitor of M. tuberculosis 

15 are human and not animal pathogens. It is also intriguing that most of these strains were of 
African or Indian origin (Fig. 3). It is likely that these ancestral strains predominantly 
originated from endemic foci (15, 26), whereas "modern" M. tuberculosis strains that have 
lost TbDl may represent epidemic M tuberculosis strains that were introduced into the same 
geographical regions more recently as a consequence of the worldwide spread of the 

20 tuberculosis epidemic. 

3,2. The evolutionary timescale of the M. tuberculosis complex 

Because of the high sequence conservation in housekeeping genes, Sreevatsan et ah 
previously hypothesized that the tubercle bacilli encountered a major bottleneck 15,000 - 

25 20,000 years ago (2). As the conservation of the TbDl junction sequence in all tested TbDl 
deleted strains suggests descendance from a single clone, the TbDl deletion is a perfect 
indicator that "modern" M. tuberculosis strains that account for the vast majority of today's 
tuberculosis cases definitely underwent such a bottleneck and then spread around the world. 

As described in detail in the results section, our analysis showed that the katCr* 63 

30 CTG— >CGG and the subsequent gyrA 95 ACC ->AGC mutations, that were used by 
Sreevatsan and colleagues to designate groups 2 and 3 of their proposed evolutionary 
pathway of the tubercle bacilli (2), occurred in a lineage of M. tuberculosis strains that had 
already lost TbDl (Fig.4). Although deletions are more stable markers than point mutations, 
which ma}' be subject to reversion, a perfect correlation of deletion and point mutation data 

35 was found for the tested strains. 
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This information, together with results from a recent study by Fletcher and 
colleagues (27), who have shown that M. tuberculosis DNAs amplified from naturally 
mummified Hungarian villagers from the 18 th and 19 th century belonged to kat(j^lgyrA s 
groups 2 and 3, suggests that the TbDl deletion occurred in the lineage of M tuberculosis 
5 before the 18 th century. This could mean that the dramatic increase of tuberculosis cases later 
in the 18 th century in Europe mainly involved "modern" M. tuberculosis strains. In addition, 
it shows that tuberculosis was caused by M tuberculosis and not by M bovis, a fact which is 
also described for cases in rural medieval England (28). 

There is good evidence that mycobacterial infections occurred in man several 
10 thousand years ago. We know that tuberculosis occurred in Egypt during the reign of the 
pharaohs because spinal and rib lesions pathognomonic of tuberculosis have been identified 
in mummies from that period (29). Identification of acid fast bacilli as well as PCR 
amplification of IS61J0 from Peruvian mummies (30) also suggest that tuberculosis existed 
in pre-Columbian societies of Central and South America. To estimate when the TbDl 
1 5 bottleneck occurred, it would now be very interesting to know whether the Egyptian and 
South American mummies carried M tuberculosis DNA that had TbDl deleted or not. 

The other major bottleneck, which seems to have occurred for members of the M 
africanum -> M microti M. bovis lineage is reflected by RD9 and the subsequent RD7, 
RDS and RD10 deletions (Fig. 4). These deletions seem to have occurred in the progenitor of 
20 tubercle bacilli that - today - show natural host spectra as diverse as humans in Africa, voles 
on the Orkney Isles (UK), seals in Argentina, goats in Spain, and badgers in the UK. For this 
reason it is difficult to imagine that spread and adaptation of RD9-deleted bacteria to their 
specific hosts could have appeared within the postulated 15,000 - 20,000 years of speciation 
of the M tuberculosis complex. 
25 However, more insight into this matter could be gained by RD analysis of ancient 

DNA samples, e. g. mycobacterial DNA isolated from a 17,000 year old bison skeleton (3 1). 
The mycobacterium whose DNA was amplified showed a spoligotype that was most closely 
related to patterns of M. africanum and could have been an early representative of the 
lineage M africanum — bovis. With the TbDl and RD9 junction sequences that we 
30 supply here, PCR analyses of ancient DNAs should enable very focused studies to be 
undertaken to learn more about the timescale within which the members of the M 
tuberculosis complex have evolved. 

3.3. Concluding comments 

35 Our study provides an overview of the diversity and conservation of variable regions 
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in a broad range of tubercle bacilli. Deletion analysis of 100 strains from various hosts and 
countries has identified some evolutionary "old" A£ canettii, M. tuberculosis and M 
africanum strains, most of tliem of African origin, as well as "modern 55 M tuberculosis 
strains, the latter including representatives from major epidemic clusters like Beijing, 
5 Haarlem and Africa. The use of deletion analysis in conjunction with molecular typing and 
analysis of specific mutations was shown to represent a very powerful approach for the study 
of the evolution of the tubercle bacilli and for the identification of evolutionary markers. In a 
more practical perspective, these regions, primarily RD9 and TbDl but also RD1, RD2, 
RD4, RD7, RD8, RD10, RD12 and RD13 represent very interesting candidates for the 

10 development of powerful diagnostic tools for the rapid and unambiguous identification of 
members of the M. tuberculosis complex (32). This genetic approach for differentiation can 
now be used to replace the often confusing traditional division of the M. tuberculosis 
complex into rigidly defined subspecies. 

Moreover, functional analyses will show whether the TbDl deletion confers some 

15 selective advantage to "modern" M. tuberculosis ', or whether other circumstances contributed 
to the pandemic of the TbDl deleted M. tuberculosis strains. 

EXAMPLE 4 

20 

The members of the M tuberculosis complex share an unusually high degree of 
conservation such that the commercially-available nucleic acid probes and amplification 
assays cannot differentiate these organisms. In addition conventional identification methods 
are often ambiguous, cumbersome and time consuming because of the slow growth of the 
25 organisms. 

In the present invention the inventors, by a deletion analysis, solve the problem faced 
by clinical mycobacteriology laboratories for differentiation within the M tuberculosis 
complex. 

This approach allows to perform a diagnostic on a biological fluid by using at least 
30 three markers including TbDl. The following table 3 illustrates such a combinaison 
sufficient to realize the distinction between the members of the Mycobacterium complex. 
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MARKERS 



MYCOBACTERIUM 



RD4 



RD9 



STRAIN 



M. bovis BCG 



TBD1 



M. bovis 



M. africanum 



M. tuberculosis 



M. tuberculosis 
ancestral 



M. canettxi 



Table 3 



10 



Beside TbDl marker, preferably at least 2 other markers should be used. Examples of such 
additional markers available in the literature are listed in the following table 1 . 
Although ancestral strains of Mycobacterium tuberculosis represent only 5% of all 
Mycobacterium tuberculosis strains, persons who would be interested in distinguishing the 
ancestral strains of Mycobacterium tuberculosis from the srains of Mycobacterium canettii, 
could consider using the genetic marker RD12 in combination with the three markers 
described in table 3. Because the region RD can partially overlapped RD12 in genome of 
Mycobacterium canettii, flanking primers as described in table 1 do not hybridize on 
genomic DNA of Mycobacterium canettii. Therefore, PCR amplification with these flanking 
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primers results in 2.8 kb PCR product in Mycobacterium tuberculosis and no PCR product in 
Mycobacterium ccniettii. 

An other way to distinguish ancestral strains of Mycobacterium tuberculosis from 
Mycobacterium canettii would be the detection of the insertion element specific for M 
5 canettii strains and corresponding to SEQ ID N° 19. 

Supplemental data: 



10 

Table 1: RD 5 RvD and TbDl regions and selected primers 



Region 



Gene 



Size 



Internal 



Flanking primers or 



absent from 



BCG 



(kb) 



Primerpair 



2 nd internal * primerpair 



RD1 Rv3871-Rv3879c 9.5 



RD2 



RvI978-Rvl988 



RD3 : 



Rvl573-Rvl5S6c 



RD4 



Rvl505c-Rvl516c 



10.8 



9.2 



12.7 



RDHn-Rv3S78F 



GTC AGC CAA GTC AGG CTA CC 



RDlin-Rv3S7SR 



CAA CGT TGT GGT TGT TGA GG 



RD2-RvI979.inLF 



TAT AGC TCT CGG CAG GTT CC 



RD2-Rvl979-intR 



ATC GGC ATC TAT GTC GGT GT 



RD3-Rvl586.inLF 



TTA TCT TGG CGT TGA CGA TG 



RD3-Rvl586.intR 



CAT ATA AGG GTG CCC GCT AC 



RD4-Rvl516.int.F 



CAA GGG GTA TGA GGT TCA CG 



RD4-Rvl5I6.inLR 



RDl-flank.left 
GAA ACA GTC CCC AGC AGG T 

RDl-flankj-ight 
TTC AAC GGG TTA CTG CGA AT 

RD2-flank.F 
CTC GAC CGC GAC GAT GTG C 

RD2-flank.R 
CCT CGT TGT CAC CGC GTA TG 

RD3-int-REP.F 
CTG ACG TCG TTG TCG AGG TA* 
RD3-int-REP.R 
GTA CCC CCA GGC GAT CTT* 
RD4-flank.F 
CTC GTC GAA GGC CAC TAA AG 
RD4-flank.R 



CGG TGA TTC GTG ATT GAA CA 



AAG GCG AAC AGA TTC AGC AT 
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Table 1 (continued) 

RD5 * Rv2346c-Rv23 53 c 

RD6* Rv3425-Rv3428c 

RD7 Rvl964-Rvl977 

RD8 ephA-lpqG 

RD9 cobL-Kv2§15 

RD10 Rv0221-Rv0223 

RD11 Rv2645-Rv2695c 

RD12 ^eC-Rv3121 

RD13 Rvl255c-Rvl257c 

RD 1 4 Rvl 765c-Rv 1 773c 
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9.0 



4.9 



12.7 



5.9 



2.0 



1.9 



11.0 



2.8 



3.0 



9.0 



RD5A-Rv2348.int.F 
AAT CAC GCT OCT GCT ACT CC 
RD5A-Rv234S.int.R 
GTG CTT TTG CCT CTT GGT C 
RD6-IS1532F 
CAG CTG GTG AGT TC A AAT GC 
RD6-IS1532R 
CTC CCG ACA CCT GTT CGT 
RD7-Rvl976.int.F 
TGG ATT GTC GAC GGT ATG AA 

RD7-Rvl976.int.R 
GGT CGA TAA GGT CAC GGA AC 
RD8-ephA.F 
GGT GTG ATT TGG TGA GAC GAT G 
RDS-ephAR 
AGT TCC TCC TGA CTA ATC CAG GC 

RD9-intF 
CGA TGG TCA ACA CCA CTA CG 

RD9-intR 
CTG GAC CTC GAT GAC CAC TC 

RDlO-intF 
GTA ACC GCT TCA CCG GAA T 

RDlO-intR 
GTC AAC TCC ACG GAA AGA CC 
RDll-Rv2646F 
CGG CAG CTA GAC GAC CTC 
RD1 l-Rv2646R 
AAC GTG CTG CGA TAG GTT TT 

RD12-Rv3120.int.F 
GAA ATA CGA GTG CGC TGA CC 
RD12-Rv3120.int.R 
CTC TGA ACC ATC GGT GTC G 

RD13intF 
GGA TGT CAC TCG GAA CGG CA 

RD13intR 
CAC CGG GCT GAT CGA GCG A 

RD14-Rvl769.int.F 
GTG GAG CAC CTT GAC CTG AT 

RD14-Rvl769.int.R 
CGT CGA ATA CGA GTC GAA CA 



RD5B-plcAint.F 
CAA GTT GGG TCT GGT CGA AT 

RD5B-plcA.int.R 
GCT ACC CAA GGT CTC CTG GT 

ND 
ND 



RD7-flank.F 
GGT AAT CGT GGC CGA CAA G 

RD7-flank.R 
CAG CTC TTC CCC TCT CGA C 

RD8-flank,F . 
CAA TCA GGG CTG TGC TAA CC 

RD8-flank.R 
CGA CAG TTG TGC GTA CTG GT 

RD9-flankF 
GTG TAG GTC AGC CCC ATC C 
RD9-flankR 
GCC CAA CAG CTC GAC ATC 

RDlO-flankF 
CTG CAA CCA TCC GGT ACA C 

RDlO-flankR 
GTC ATG AAC GCC GGA CAG 
RDll-fla-F 
TCA CAT AGG GGC TGC GAT AG 

RDll-fla-R 
AGA GGA ACC TTT CGG TGG TT 

RD12-flank.F 
GCC ATC AAC GTC AAG AAC CT 

RD12-flank.R 
CGG CCA GGT AAC AAG GAG T 

RD13-flank.F 
CGA TGG TGT TTC TTG GTG AG 

RD13-flank.R 
GGA TCG GCT CAG TGA ATA CC 

RD14-flankF 
TTG ATT CGC CAA CAA CTG AA 

RD14-flankR 
GGG CTG GTT AGT GTC GAT TC 
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Table 1 



(continued) 



Region missing from M tuberculosis H37Rv 



RvDl* 



5.0 



RvD2* 



plcD 



5.1 



RvD3 



1.0 



RvD4* 



PPE gene 



0.8 



RvD5 



moa 



4.0 



TbDl 



mmpL6 



2.1 



RvDI-intlF 



AGC GCG TCG AAC ACC GGC 



RvDl-intlR 



CCT GAA TCC GCG CAA TTC CAT 



RvD2-intlF 



GTT CTC CTG TCG AAC CTC CA 



RvD2-intlR 



ACT TCA CCG GTT TCA TCT CG 



RvD3-intF 



ATC GAT CAG GTC GTC AAT GC 



RvD3-intR 



ACG CCA CCA TCA AGA TCC 



RvD4-intF-PPE 



GGT TGC CAA CGT TAC CGA TGC 



RvD4-intR-PPE 



CCG GTG GTG GTG GCG GCT 



RvDSintF 



GGG TTC ACG TTC ATT ACT GTT C 



RvD5intR 



CCT GCG CTT ATC TCT AGC GG 



TBD1 intS.F 



CGT TCA ACC CCA AAC AGG TA 



TBDlintS.R 



AAT CGA ACT CGT GGA ACA CC 

# 

katG, gyrA, oxyR\pncA and mmpL6TCR and sequencing primers 



katG 4 



63 



gyrA 



95 



oxyR 



HS5 



Jto/G-2154,225-PCR-F 
CTA CCA GCA CCG TCA TCT CA 

jk7/G-2]55,157-PCR-R 
AGG TCG TAT GGA CGAACA CC 

gyr^-7,127-PCR-F 
GTT CGT GTG TTG CGT CAA GT 

gyrA- 8,312-PCR-R 
CAG CTG GGT GTG CTT GTA AA 

axyR 2725.559F 
TAT GCG ATC AGG CGT ACT TG 

aryfl-2726,024-PCR-R 
CAA AGC AGT GGT TCA GCA GT 



RvDl-tnt2.F 



GAG CCA CTC CGA TGT TGA CT 



RvDl-inG.R 



CAC GCG AAC CCT ACC TAC AT 



RvD2-int2F 



GGA CGG TGA CGG TAT TTG TC 



RvD2-int2R 



TCG CCA ACT TCT ATG GAC CT 



RvD3-flank.F 



AAA CCA TGC AGC GTC TGC CA 



RvD3-flankR 



GCG TTT CTG CGT CTG GTT GA 



ND 



ND 



RvD5-flankF 



CCC ATC GTG GTC GTT CAC C 



RvD5-flankR 



GTA CCC GCA CCA CCT GCT G 



TBDlflal-F 



CTA CCT CAT CTT CCG GTC CA 



TBDlflal-R 



CAT AGA TCC CGG ACA TGG TG 



fo7/G-2154,872-SEQ-R 
ACA AGC TGA TCC ACC GAG AC 



gy/v4-7,461F 
CGG GTG CTC TAT GCA ATG TT 



oxy^-2726,024-SEQ-R 
CAA AGC AGT GGT TCA GCA GT 
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Table 1 



(continued) 



pncA 



57 



mmpL6 



551 



p/^-228S,678-PCR-F 
ATC AGG AGC TGC AAA CCA AC 

p/jc/*-2289,319-PCR-R 
GGC GTC ATG GAC CCT ATA TC 

mmpL-stq5¥ 
GTA TCA GAG GGA CCG AGC AG 
TBDlflal-R 



/»>c.4-2289,319-SEQ-R 
GGC GTC ATG GAC CCT ATA TC 



wmpL-seq5F 
GTA TCA GAG GGA CCG AGC AG 



CAT AGA TCC CGG ACA TGG TG 

The RD nomenclature used in this table is based on that used by Brosch et al (2000), (Ref. 25) and 
differs from that proposed by Behr and coworkers (1999), (Ref. 6). Primer sequences are shown in 5' 

-^3* direction. 

* Regions where a second pair of internal primers was used rather than flanking primers, due to 
flanking repetitive regions, and/or mobile genetic elements. 
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CLAIMS 



1. An isolated or purified nucleic acid wherein said nucleic acid is selected from the 
group consisting of: 
5 -a. SEQIDN 0 1; 

b. Nucleic acid having a sequence fully complementary to SEQ ID N°l; 

c. Nucleic acid having at least 90% sequence identity after optimal 
alignment with a sequence defined in a) or b); 

d. Nucleic acid that hybridizes under stringent conditions with the nucleic 
10 acid defined in a) or b). 

2. A nucleic acid fragment comprising at least 8 to 2000 consecutive nucleotides 
comprised in at least one nucleic acid according to claim 1 . 

15 3. The nucleic acid fragment according to claim 2, characterized in that it is susceptible 

to be used as a probe or a primer specific of SEQ ED N°l . 

4. The nucleic acid fragment according to claim 2, selected from the group consisting 
of : SEQ ID N°l 7 5 SEQ ID N°l 8. 

20 

5. The nucleic acid fragment according to claim 2, characterized in that it is obtained 
by specific amplification of SEQ ID N°l with the pair of primers SEQ ID N° 17 and 
SEQIDN°18. 

25 6. The nucleic acid fragment according to claim 2 wherein said nucleic acid fragment 

is: 

specifically deleted from the genome of Mycobacterium tuberculosis, excepted 
in Mycobacterium tuberculosis strains having the sequence CTG at codon 463 
of gene katG and having no or very few IS6110 sequences inserted in their 
30 genome; and, 

present in the genome of Mycobacterium afi'icanum, Mycobacterium canetti, 
Mycobacterium microti, Mycobacterium bovis, Mycobacterium bovis BCG. 



35 



7. The nucleic acid fragment according to claim 2 or 6 selected from the group 
consisting of : 
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a) SEQIDN°4; 

b) Nucleic acid having a sequence fully complementary to SEQ ID N°4; 

c) Nucleic acid having at least 90% sequence identity after optimal alignment with a 
sequence defined in a) or b); 

d) Nucleic acid that hybridizes under stringent conditions with the nucleic acid 

defined in a) or b). 

8. A nucleic acid fragment comprising at least 8 to 2000 consecutive nucleotides of at 
least one nucleic acid according to claim 7. 

9. The nucleic acid fragment according to claim 2 or 8, characterized in that it is 
susceptible to be used as a probe or a primer specific of SEQ ID N°l and SEQ ID 
N°4. 

15 10. The nucleic acid fragment according to claim 9, selected from the group consisting 

of: SEQ IDN°13 ? SEQ ID N° 14, SEQ ID N° 15, SEQ IDN°16. 

1 1 . A nucleic acid fragment according to claim 9, characterized in that is obtained by 
specific amplification of SEQ ID N°l or SEQ ID N°4 with one pair of primers 

20 choosed in the group consisting of SEQ ID N°13, SEQ ID N°14, SEQ ID N°15, SEQ 

IDN°16. 

12. The nucleic acid fragment according to claim 9, characterized in that it is obtained 
by specific amplification of SEQ ID N°l or SEQ ID N°4 with the pair of primers 

25 SEQ ID N°13 and SEQ ED N°14. 

13. The nucleic acid fragment according to claim 9, characterized in that it is obtained 
by specific amplification of SEQ ID N°l or SEQ ID N°4 with the pair of primers 
SEQ ID N°15 and SEQ ID N°16. 



30 



14. The isolated or purified nucleic acid according to claim 1 wherein said nucleic acid 
comprises at least a deletion of a nucleic acid fragment according to any of claims 6, 
7 and 8. 
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15. An isolated or purified polypeptide encoded by the nucleic acid according to any of 
claims 1, 2, 6, 7, 8 and 14. 

16. The polypeptide according to claim 15 selected among polypeptides with sequence 
5 SEQ ID N°6, SEQ ID N°8 5 SEQ ID N°10, SEQ ID N°12, SEQ ID N°22 and 

fragments thereof. 

17. An isolated or purified nucleic acid encoding a polypeptide according to claim 16. 

1 8. The isolated or purified nucleic acid according to claim 17, wherein said nucleic acid 
is selected among : . 

- SEQ ID N°5 encoding the polypeptide of SEQ ID N°6; 

- SEQ ID N°7 encoding the polypeptide of SEQ ID N°8; 

- SEQ ID N°9 encoding the polypeptide of SEQ ID N° 1 0; 

- SEQ ID N°l 1 encoding the polypeptide of SEQ ID N°12; 

- SEQ ID N°2 1 encoding the polypeptide of SEQ ID N°22; 
and fragments thereof. 

19. A recombinant vector comprising a nucleic acid sequence selected among nucleic 
acids according to any of claims 1, 2, 3, 5, 6, 7, 8, 9, 1 1, 12, 13 and 14. 

20. The recombinant vector of claim 19 consisting of vector named X229 introduced 
into the recombinant Escherichia coli deposited at the CNCM on February 18 th , 
2002 under N° 1-2799. 

21. A recombinant cell comprising a nucleic acid sequence selected among nucleic acids 
according to any of claims 1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13 and 14 or a vector 
according to claim 19 or 20. 

30 22. The recombinant cell according to claim 21 consisting of the Escherichia coli 

deposited at the CNCM on February 1 8 th , 2002 under N° 1-2799. 

23. A method for the discriminatory detection and identification of : 



10 



15 



20 



25 
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- Mycobacterium tuberculosis excepted Mycobacterium tuberculosis strains having the 
sequence CTG at codon 463 of gene katG and having no or very few IS6110 sequences 
inserted in their genome; versus, 

- Mycobacterium afi'icanum, Mycobacterium canetti, Mycobacterium microti, 
5 Mycobacterium bovis, Mycobacterium bovis BCG in a biological sample, 

comprising the following steps: 

a) isolation of the DNA from the biological sample to be analyzed or 

production of a cDNA from the RN A of the biological sample, 

b) detection of the nucleic acid sequences of the mycobacterium present in said 

10 biological sample, 

c) analysis for the presence or the absence of a nucleic acid fragment according 

to any of claims 6, 7 and 8. 

24. The method as claimed in claim 23, wherein the detection of the mycobacterial DNA 
15 sequences is carried out using nucleotide sequences complementary to said DNA 

sequences. 

25. The method as claimed in claim 23 or 24, wherein the detection of the mycobacterial 
DNA sequences is carried out by amplification of these sequences using primers. 



20 



26. The method as claimed in claim 25, wherein the primers have a nucleotide sequence 
chosen from the group comprising SEQ ID N°13, SEQ ID N°14, SEQ ID N°15, 
SEQ ID N°16, SEQ ID N°17, SEQ ID N°18. 

25 27. A method for the discriminatory detection and identification of : 

- Mycobacterium tuberculosis excepted Mycobacterium tuberculosis strains having 
the sequence CTG at codon 463 of gene katG and having no or very few IS61 10 sequences 

inserted in their genome; versus, 

- Mycobacterium africanum, Mycobacterium canetti, Mycobacterium microti 
30 Mycobacterium bovis, Mycobacterium bovis BCG in a biological sample, 

comprising the following steps: 

a) bringing the biological sample to be analyzed into contact with at least one pair of 
primers as defined in claim 25 or 26, the DNA contained in the sample having been, where 
appropriate, made accessible to the hybridization beforehand, 
35 b) amplification of the DNA of the mycobacterium, 
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c) visualization of the amplification of the DNA fragments. 

28. A kit for the discriminatory detection and identification of : 

- Mycobacterium tuberculosis excepted Mycobacterium tuberculosis strains 
5 having the sequence CTG at codon 463 of gene katG and having no or very 

few IS61 10 sequences inserted in their genome; versus, 

- Mycobacterium africanum, Mycobacterium canetti, Mycobacterium microti, 
Mycobacterium bovis, Mycobacterium bovis BCG in a biological sample, 

comprising the following elements: 
10 a) at least one pair of primers as defined in claim 25 or 26, 

b) the reagents necessary to carry out a DNA amplification reaction, 

c) optionally, the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment. 

15 29. The use of at least one pair of primers as defined in claim 25 or 26 for the 

amplification of a DNA sequence from Mycobacterium tuberculosis, Mycobacterium 
africanum, Mycobacterium canettii, Mycobacterium microti, Mycobacterium bovis 
or Mycobacterium bovis BCG. 

20 30. The use of at least one pair of primers or at least one nucleic acid fragment 

according to any of claims 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 and 14 for the 
detection of a DNA sequence from Mycobacterium tuberculosis, Mycobacterium 
africanum, Mycobacterium canettii, Mycobacterium microti, Mycobacterium bovis 
or Mycobacterium bovis BCG. 

25 

31. A product of expression of all or part of the nucleic acid fragment as claimed in any 
of claims 6, 7 and 8. 

s 

32. A method for the in vitro discriminatory detection of antibodies directed against 
30 Mycobacterium tuberculosis excepted Mycobacterium tuberculosis having the 

sequence CTG at codon 463 of gene katG and having no or very few IS6110 
sequences inserted in their genome, versus antibodies directed against 
Mycobacterium africanum, Mycobacterium canetti, Mycobacterium microti, 
Mycobacterium bovis, hdycobacterium bovis BCG, in a biological sample, 
35 comprising the following steps: 
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a) bringing the biological sample into contact with at least one product as 

defined in claim 31, 

b) detecting the antigen-antibody complex formed. 

5 33. A method for the in vitro discriminatory detection of a vaccination with 

Mycobacterium bovis BCG, an infection by M. bovis, M canettiU M microti, M 
africanum or M. tuberculosis strains having the sequence CTG at codon 463 of gene 
katG and having no or very few IS6110 sequences inserted in their genome versus 
an infection by Mycobacterium tuberculosis, excepted Mycobacterium Tuberculosis 

10 strains having the sequence CTG at codon 463 of gene katG and having no or very 

few IS6110 sequences inserted in their genome in a mammal, comprising the 
following steps : 

a) preparation of a biological sample containing cells, more particularly cells of the 
immune system of said mammal and more particularly T cells, 
i 5 b) incubation of the biological sample of step a) with at least one product as defined 

in claim 31, 

c) detection of a cellular reaction indicating prior sensitization of the mammal to 
said product, in particular cell proliferation and/or synthesis of proteins such as 
gamma-interferon. 

20 

34. A kit for the in vitro discriminatory diagnosis of a vaccination with M. bovis BCG, 
an infection by M bovis, M. canettiU M microti, M africanum versus an infection 
by M. tuberculosis excepted by strains having the sequence CTG at codon 463 of 
gene katG and having no or very few IS6110 sequences inserted in their genome, in 

25 a mammal comprising : 

a) a product as defined in claim 3 1 , 

b) where appropriate, the reagents for the constitution of the medium 
suitable for the immunological reaction, 

c) the reagents allowing the detection of the antigen-antibody complexes 
30 produced by the immunological reaction, 

d) where appropriate, a reference biological sample (negative control) free 
of antibodies recognized by said product, 

e) where appropriate, a reference biological sample (positive control) 
containing a predetermined quantity of antibodies recognized by said 

35 product. 
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35. A mono- or polyclonal antibody, a chimeric fragment or a chimeric antibody thereof, 
characterized in that it is capable of specifically recognizing a product as defined in 
claim 31. 

5 

36. A method for the in vitro discriminatory detection of the presence of an antigen of 
Mycobacterium tuberculosis excepted of strains having the sequence CTG at codon 
463 of gene katG and having no or very few IS6110 sequences inserted in their 
genome versus an antigen of Mycobacterium qfricanum, Mycobacterium canetti, 

10 Mycobacterium microti, Mycobacterium bovis, hfycobacterium bovis BCG or 

Mycobacterium tuberculosis having the sequence CTG at codon 463 of gene katG 
and having no or very few IS61 10 sequences inserted in their genome in a biological 
sample comprising the following steps : 

a) bringing the biological sample into contact with an antibody as claimed 
15 in claim 35, 

b) detecting the antigen-antibody complex formed. 

37. A kit for the in vitro discriminatory detection of the presence of an antigen of 
Mycobacterium tuberculosis excepted of strains having the sequence CTG at codon 

20 463 of gene katG and having no or very few 1S6110 sequences inserted in their 

genome versus an antigen of Mycobacterium qfricanum, Mycobacterium canetti, 
Mycobacterium microti, Mycobacterium bovis, Mycobacterium bovis BCG, or 
Mycobacterium tuberculosis having the sequence CTG at codon 463 of gene katG 
and having no or very few IS6110 sequences inserted in their genome, in a 

25 biological sample comprising the following steps : 

a) an antibody as claimed in claim 35, 

b) . the reagents for constituting the medium suitable for the immunological 

reaction, 

c) the reagents allowing the detection of the antigen-antibody complexes 
30 produced by the immunological reaction. 

38. An immunogenic composition, characterized in that it comprises at least one product 
as defined in claim 3 1 . 
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39. A vaccine, characterized in that it comprises at least one product as defined in claim 
31 in combination with a pharmaceutical^ compatible vehicle and, where 
appropriate, one or more appropriate immunity adjuvants. 

40. An in vitro method for the detection and identification of Mycobacterium 
tuberculosis excepted Mycobacterium tuberculosis strains having the sequence CTG 
at codon 463 of gene katG and having no or very few IS61 10 sequences inserted in 
their genome in a biological sample, comprising the following steps : 

a) isolation of the DNA from the biological sample to be analyzed or production of 
a cDNA from the RNA of the biological sample, 

b) detection of the nucleic acid sequences of the mycobacterium present in said 
biological sample, 

c) analysis for the presence or the absence of a nucleic acid fragment according to 
any of claims 6, 7 and 8. 

41. An in vifro method for the detection and identification of Mycobacterium 
tuberculosis excepted Mycobacterium tuberculosis strains having the sequence CTG 
at codon 463 of gene katG and having no or very few IS61 10 sequences inserted in 
their genome in a biological sample, comprising the following steps: 

a) bringing the biological sample to be analyzed into contact with at least one pair 
of primers selected among nucleic acids according to any of claims 1 to 14, 17 
and 1 8, and more preferably selected among the primers chosen from the group 
comprising SEQ ID N°13, SEQ ID N°14, SEQ ED N°15, SEQ ID N°16, SEQ ID 
N°17, SEQ ID N°18, the DNA contained in the sample having been, where 
appropriate, made accessible to the hybridization beforehand, 

b) amplification of the DNA of the mycobacterium, 

c) visualization of the amplification of the DNA fragments. 

42. A kit for the detection and identification of Afycobacterium tuberculosis excepted 
Mycobacterium tuberculosis strains having the sequence CTG at codon 463 of gene 
katG and having no or very few IS6110 sequences inserted in their genome, in a 
biological sample, comprising the following elements : 

a) at least one pair of primers selected among nucleic acids according to any of 
claims 1 to 14, 17 and 18, and more preferably selected among the primers 
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chosen from the group comprising SEQ ID N°13, SEQ ID N°14, SEQ ID N°15, 
SEQ ID N°16, SEQ IDN°17, SEQ ID N°18, 

b) the reagents necessary to carry out a DNA amplification reaction, 

c) optionally, the necessary components which make it possible to verify or 
5 compare the sequence and/or the size of the amplified fragment. 

43 . A method for the in vitro detection of antibodies directed against Mycobacterium 
tuberculosis excepted Mycobacterium tuberculosis strains having the sequence CTG 
at codon 463 of gene katG and having no or very few IS61 10 sequences inserted in 
10 their genome, in a biological sample, comprising the following steps : 

a) bringing the biological sample into contact with at least one product as defined 
in claim 31, 

b) detecting the antigen-antibody complex formed. 

15 44. Use of TbDl deletion as a genetic marker for the differentiation of Mycobacterium 

strains of Mycobacterium complex. 

45. Use of mmpL6 551 polymorphism as a genetic marker for the differentiation of 
Mycobacterium strains of Mycobacterium complex. 

20 

46. Use of the genetic marker according to claim 44 in association with at least one 
genetic markers selected among RD1, RD2, RD3, RD4, RD5, RD6, RD7, RDS, 
RD9, RD10, RD11, RD13, RD14, RvDl, RvD2, RvD3, RvD4, RvD5, katG 463 , 
gyrA 95 , oxyR' 285 , pncA 57 , mmpL6 551 , the specific insertion element of M canettii for 

25 the differentiation of Mycobacterium strains of Mycobacterium complex. 

47. An in vitro method for the detection and identification of Mycobacteria from the 
Mycobacterium complex in a biological sample, comprising the following steps : 

c) analysis for the presence or the absence of a nucleic acid fragment of a sequence 
30 according to claim 6, 7. or 8, and 

d) analysis of at least one additional genetic marker selected among RD1, RD2, RD3, 
RD4, RD5, RD6, RD7, RDS, RD9, RD10, RD11, RD13, RD14, RvDl, RvD2, 
RvD3, RvD4, RvD5, katG 463 , gyrA 95 , oxyR' 285 , pncA 57 , mmpL6 551 , the specific 
insertion element of M cafiettii. 

35 
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48. The in vitro method of claim 47 wherein two additional markers are used, preferably 
RD4 and RD9. 

49. The in vitro method of claim 47 wherein three additional markers are used, 
preferably RD4, RD9 and RD12. 

50. The method according to claim 47 wherein the analysis is performed by a technique 
selected among sequence hybridization, nucleic acid amplification, antigen-antibody 

complex. 

51. A kit for the detection and identification of Mycobacteria from the Mycobacterium 
complex in a biological sample comprising the following elements : 

a) at least one pair of primers selected among nucleic acids according to any of 
claims 1 to 14, 17 and 18, and more preferably selected among the primers 
chosen from the group comprising SEQ ID N°13, SEQ ED N°14, SEQ ID N15, 
SEQ ID N° 1 6, SEQ ID N°l 7, SEQ ID N° 1 8, 

b) at least one pair of primers specific of the genetic markers selected among RD1, 
RD2, RD3, RD4, RD5, RD6, RD7, RD8, RD9, RD10, RD11, RD13, RDM, 
RvDl, RvD2, RvD3, RvD4, RvD5, katG 463 , gyrA 95 , oxyR' 285 , pncA 57 , 
mmpL6 551 , the specific insertion element of M. canettii 

c) the reagents necessary to carry out a DNA amplification reaction, 

d) optionally, the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment. 



52. A kit according to claim 51 comprising the following elements : 

a) at least one pair of primers selected among nucleic acids according to any of 
claims 1 to 14, 17 and 18, and more preferably selected among the primers 
chosen from the group comprising SEQ ID N°13, SEQ ID N°14, SEQ ID N°15, 
SEQ ID N°16, SEQ ID N°17, SEQ ID N°18, 
30 b) one pair of primers specific of the genetic marker RD4, 

c) one pair of primers specific of the genetic marker RD9, 

d) the reagents necessary to carry out a DNA amplification reaction, 

e) optionally, the necessary components which make it possible to verify or 
compare the sequence and/or the size of the amplified fragment. 



35 
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53. An immunogenic composition, characterized in that it comprises the polypeptide of 
sequence SEQ ID N°22. 

54. A vaccine, characterized in that it comprises the polypeptide of sequence SEQ ID 
5 N°22 in combination with a pharmaceutical^ compatible vehicle and, where 

appropriate, one or more appropriate immunity adjuvants. 

55. Use of the genetic marker according to claim 45 in association with at least one 
genetic markers selected among RD1, RD2, RD3, RD4, RD5, RD6, RD7, RDS, 

10 RD9, RD10, RD11, RD13, RDM, RvDl, RvD2, RvD3, RvD4, RvD5, TbDl, 

katG 463 , gyrA 95 , oxyR' 285 , pncA 57 , the specific insertion element of A£ caiiettii for 
the differentiation of Mycobacterium strains of Mycobacterium complex. 

56. A nucleic acid specifically present in strains of M. canettii and absent from all other 
15 members of the Mycobacterium complex and having the sequence from position 399 

to position 2378 of SEQ ID N° 19. 

57. Use of the nucleic acid according to claim 53 as a genetic marker for the 
differentiation of Mycobacterium strains of Mycobacterium complex. 

20 

58. A reagent for the identification of a Mycobacterium infection comprising at least 
polynucleotide sequences capable to hybridize under stringent conditions with at 
least 8 to 20 nucleotides of the RD1, RD4, RD9 and TbDl genetic markers. 

25 59. A reagent for the identification of a Mycobacterium infection comprising at least one 

polypeptide encoded by each of the RD1, RD4, RD9 and TbDl genetic markers 
capable to react with an antibody or an immune serum raised against the same 
immunogenic molecules or fragments thereof. 
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gatcccgtcg ccgcggcgct ggagctggcc gccgggcccg cagccgcccc gcgcgaggtc 60 

gtgctggcga gcaaagccac catgcgcgcc acagccagcc ccggatcgct ggaccttgag 12 0 

caacacgaac tcgccaaacg cttagaactt gggccgcagg cgaaatcggt ccagtcgccc 18 0 

gagttcgccg ctcgcttggc tgccgctcaa cacaggtagc gcctaccagc ctcgctggtt 24 0 

tccatggcgt gccccagtcc gaagctgctg ctgcttgact ccgcgcgctg ggcccgagcg 3 00 

cgcgctgttg tacggcccaa acggcgtgtc ggtgtacagt cgcgcgctcg cggcttcagt 360 

ccggcccccc gactccggca ggcccgacgg cgcccagcgc tagcccgaag ttcccccttg 42 0 

taggggcggg ctgagtttcg atctgtttcg tgagcaggtg tttctgtgtt caacttccct 480 

caacatgtac tcatgtatta ttgagaatag ctcggcgtgt catcctctga tgacgctatt 54 0 

atcgcgctga ccgcgtgtta taaagtaatc atgtacatta cccgggtacc caaccgggga 60 0 

tccccgccgg cggtgctgtt gcgggaaagc ttccgcgaaa acggcaaggt caagacgcgt 660 

accctggcca acctctcacg ctggcccgag cacaagctgg acagactgga ccgggcgctt 72 0 

aagggcttgc cgcccgcgga ctgggatcta gccgaggcct tcgatatcac ccgcagcctg 780 

ccgcacgggc atgtggccgc ggtggccggc accgccgaga agctgggcat acccgagctg 84 0 

atcgacccca ccccgtcgcg gcggcgcaac ctggtgctgg ccatgctgat cgggcagatc 900 

atcgagcccg gatcgaaact ggcgatcgcg cgcgggctgc gcgcccagac cgccaccagc 960 

acgctgggtg cggtgctggg tgtctcgggc gccgatgagg acgacctgta tgacgcgatg 102 0 

gactgggcgc tggagcgcaa agacggcatc gaaaacgcct tggccgcacg gcatctgacc 10 8 0 

aacggcaccc tggtgctcta tgacgtatcc tcggcggcgt tcgagggcca cacctgcccg 114 0 

ctgggagcga tcgggcacgc ccgcgacggg gtcaaaggcc ggctgcagat cgtctacggg 12 00 

ctgctgtgct cacccaaggg agcgccggtg gccatcgagg tgttcaaggg caacaccgcc 1260 

gacccgaaaa ctctgaaagc tcaaatcgac aagctcaaaa cccggttcgg gttgacccgc 132 0 

atcgccctgg tgggcgatcg gggcatgctc acttccgcgc gcatccgtga cgagctgcgt 13 80 

ccggcgcacc tggattggat cagcgcgctg cgcgccccgc agatcaagat cctgctcgag 144 0 

gacggggcgc tgcagctgtc gctgttcgat gagcagaacc tgttcgagat cactcacccc 1500 

gactatcccg gtgagcggct ggtgtgcfcgc cacaaccccg ccctggccga cgagcgcgcc 15 60 

cgcaaacgcg ccgagctgct ggcggccacc gaaaaggagc tgcaggccat cgccgaagcc 1620 

acccgccgcc aacgccggcc gttacgcggt acagacaaga tcggcctgcg ggtgggcaag 1680 

gtgcgcaaca agttcaagat ggccaagcac tttgacctgc acatcaccga tgaggccttc 174 0 

agcttcaccc gcaaccagaa 'cagtatcgcc gccgaggccg ccctcgacgg catctacgtg 180 0 

ctacgcacca gcctgcccga caacgccctg ggccgcgacg acgtggtggg ccgctacaaa 1860 

gacctcgccg acgtcgaacg cttcttccgc accctcaaca gcgaactgga cgtacgcccc 192 0 

atccggcatc ggctggccga ccgggtccgc gcccacatgt tcttgcacat gctctcctac 1980 

tacatcagct ggcacatgaa acaagccctg gccccaatcc tgttcaccga caacgacaaa 2040 

cccgccgccg ccgccaaacg cgccgacccc gtcgcgccag cccaacgctc cgacgaagcg 2100 

ctgaacaagg cagcacgcaa acgcaccgaa gacaaccaac cggtgcacag cttcaccagc 2160 

ctgctcaccg acctggccac catctgcgcc aactacatcc aacccacaga cgacctgcca 2220 

gcattcacca aaaccaccac ccccaccccc acacaacggc gcgccttcga ccfcactggcc 22 80 

gtttcccacc gccacggcct ggcgtagtca gtaccgaacc acaaatgccc aggtcaacga 2340 

cacaaaccgc gccggatcacj ggggaacttc gggctagc cg ggcgcgccgg 23 90 
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SEQUENCE LISTING 

110 > INSTITUT PASTEUR 

VETERINARY LABORATORIES AGENCY 

12 0> DELETED SEQUENCE IN M. TUBERCULOSIS, METHOD FOR 
DETECTING MYCOBACTERIA USING THESE SEQUENCES AND 
VACCINES 

<130> D20110 

<160> 22 

<170> Patentln Ver. 2.1 



<210> 1 
<211> 3953 
<212> DNA 
< 



llll Mycobacterium tuberculosis strain 74 ( -ancestral" strain) 



<220> 
<221> CDS 

<222> (735) . . (3638) 



tccagcgcgg ccatcagcga tgaactctgg gacctgctac ccggctacct catcttccgg 60 
tccatcatcc ccaaccggcc gcccacccag gacacggtgc aagccctcgt cgacgacgtg 120 
atactcccca gcctcacccg atccaccggt tgagtcagcg gtgcgaatgg ctgggcaccg 180 
ttgtggtgtc cggtcccgta ccgtactgtt gaatccgcgg atccccgcct gaggtacggg 240 
gcgtggtcgc gccccgggca atagcgtcgc cggttatcga aaggctaacg ggtgcagggg 300 
atttcagtga ctggcctggt caaacgcggc tggatggtgc tggttgccgt ggcggtggtg 360 
gcggtcgcgg gattcagcgt ctatcggttg cacggcatct tcggctcgca cgacaccacc 42 0 
tcgaccgccg gtggtgtcgc gaacgacatc aagccgttca accccaaaca ggtaaccctc 480 
gaggtctttg gcgctcccgg aaccgtggca acgatcaatt atctggacgt ggatgccaca 540 
cctcggcaag tcctggacac gaccctgccg tggtcataca cgatcacgac gaccctgccc 600 
gcggtcttcg ccaatgttgt cgcgcaaggc gacagcaatt ccatcggctg ccgcatcacc 660 
gtcaacggtg tagtcaagga cgaaaggatc gtcaacgaag tgcgcgccta taccttctgc 720 
.tcgacaagt cctc atg age aac cac cac cgc ccg egg act gj ttg ccg 770 



1 3 

818 



^ nht tea ttq ccg ate ttg ctg ttt tgg gtg ggt 
cac acc ate cga egg ctt teg ^3 ^ y ^ val Gly 

His Thr He Arg Arg Leu Ser Leu Pro He Leu Leu Phe Trp 

on ^ 



15 20 



^ ^a acc aat gec gec gtg ccg caa ttg gag gtg gtc ggg 
fa! Ill HI SI E En L. La Val Pro .In Leu Olu Val Val Gly 
30 35 



866 
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gag gcg cat aac gtc gca cag age tec ccg gat gac ccg teg ctg cag 914 
Glu Ala His Asn Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin 
45 50 55 60 

gcg atg aaa cgc ate ggc aag gtg ttc cac gag ttc gat tec gac agt 962 
Ala Met Lys Arg lie Gly Lys Val Phe His Glu Phe Asp Ser Asp Ser 

65 70 75 

gcg gee atg ate gtc ttg gaa ggc gat aag ccg etc ggc aac gac gee 1010 
Ala Ala Met lie Val Leu Glu Gly Asp Lys Pro Leu Gly Asn Asp Ala 

80 85 90 

cac egg ttc tac gac acc ctg etc cgc aac ctt tea aac gac ace aaa 1058 
His Arg Phe Tyr Asp Thr Leu Leu Arg Asn Leu Ser Asn Asp Thr Lys 

95 100 105 

cac gtc gag cac gtt cag gac ttc tgg ggc gat ccg ctg acc gcg gee 1106 
His Val Glu His Val Gin Asp Phe Trp Gly Asp Pro Leu Thr Ala Ala 
110 115 120 

ggc teg caa age acc gac ggc aaa gee gee tac gtt cag gtc tat etc 1154 
Gly Ser Gin Ser Thr Asp Gly Lys Ala Ala Tyr Val Gin Val Tyr Leu 
125 130 135 140 

gec ggc aac caa ggc gag gcg ttg tea ate gag tec gtc gac gcg gtg 12 02 
Ala Gly Asn Gin Gly Glu Ala Leu Ser lie Glu Ser Val Asp Ala Val 

145 150 155 

cgc gac ate gtc gee cat acg cca cca ccg gec ggg gtc aag gee tac 1250 
Arg Asp lie Val Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala Tyr 

160 165 . 170 

gtc acc ggc gcg gec ccg etc atg gee gat cag ttt cag gtg ggc age 12 98 
Val Thr Gly Ala Ala Pro Leu Met Ala Asp Gin Phe Gin Val Gly Ser 
175 180 185 



etc gga aac gec ggg gta ate ggg ctg teg aca tac teg acg aat ctg 
Leu Gly Asn Ala Gly Val He Gly Leu Ser Thr Tyr Ser Thr Asn Leu 



40 245 250 



1346 



aaa gga acc gcg aaa gtt acc 'ggg ata act ctg gtt gtg ate gcg gtg 
Lys Gly Thr Ala Lys Val Thr Gly He Thr Leu Val Val He Ala Val 
190 195 200 

atg ttg etc ttc gta tac cgt tec gtc gtc acc atg gtc ctg gtg ctt 1394 
Met Leu Leu Phe Val Tyr Arg Ser Val Val Thr Met Val Leu Val Leu 
205 210 215 220 

ate acg gtt ctt att gag ttg gee gcg gee cgc ggg ate gtc get ttt 1442 
He Thr Val Leu He Glu Leu Ala Ala Ala Arg Gly He Val Ala Phe 

225 230 235 



1490 



etc aca eta ttg gta ate gcg gcg ggc aca gac tac gcg att ttt gtc 153 8 
Leu Thr Leu Leu Val He Ala Ala Gly Thr Asp Tyr Ala He Phe Val 
255 260 265 

etc ggc cgc tat cac gag gcg cgc tac gee gca cag gat egg gaa acg 
Leu Gly Arg Tyr His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr 
270 275 280 



1586 



gee ttc tac acg atg tat cgc ggg acc gee cac gtc gtc ttg ggc teg 



1634 
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„ Ala K i s val Val Leu Gly Ser 
Ala Phe Tyr Thr Met Tyr Arg Gly Thr Ala His Val ^ 



285 290 



t-at- toe ctq age ttt acc egg eta 
ggt ctg acc gtt gee ggc gcg gtg tat tgc g ^ ^ ^ ^ ^ 

Gly Leu Thr Val Ala Gly Ala vai ±y* y 315 

305 JA 

4. att ccc gec teg ata ggg gtg atg att 

ccc tat ttt caa age ctg ggt att ccc g g ^ Met Ile 

Pro Tyr Phe Gin Ser Leu Gly He Pro Aia ^ 

320 ^ 



1682 



1730 



a ac ctq gec cca tec gtg etc ate ttg ggc 
gcg ttg gca gec gcg etc age ctg g Ile Leu Gly 

Ala Leu Ala Ala Ala Leu Ser Leu Ala Pro 



1778 



335 340 



«t c g t ttc g? t t g t ttc g- ccc aa g ccc JJJ «. « S « S 
Ser Arg Phe Gly Cys Phe Glu Pro uys. a ^ 

o c ^ 



1826 



350 355 



4-^ rrt-n rat taa ccg gga ccc ate ctg 
tgg egg ege ate ggc acg gee ate gtg ^ ^ ^ ^ prQ ^ 
Trp Arg Arg lie Gly Tnr Aia j-x ^ 380 



1874 



365 370 



gea gtg geg tge gea att gcg gtg gtg ggt ctg etc gcg etg ccg ga 
Ala Val Ala Cys Ala lie Ala Val Val Gly ^ 

3 85 

« far t-ac ata ccc gee acc gee ccg 
tac aaa acg age tac gac get cgc tat tac a^g ^ ^ ^ ^ ^ 
Tyr Lys Thr Ser Tyr Asp Ala Arg Tyr xy ^ Q 



1922 



1970 



raa cat ttt ccc caa gcg egg 
gee aat att gge tac atg gee geg gag g ^ ^ Qla ^ Arg 

Ala Asn He Gly Tyr Met Ala Ala w. a 



2018 



415 420 



*. ate aaq acg gat cac gat atg cgc aat 

ctg aat ccc gaa eta etg atg, ate gag ^g ^ ^ ^ ^ ^ Asn 

Leu Asn Pro Glu Leu Leu Mec x± ^ 



2066 



430 435 



cc g g cc g ac at g etc atc ttj £ W £ J== « •« J£ £ 

Pro Ala Asp Met Leu He Leu Asp «3 ^ 460 



2114 



445 450 



„^ at-a acc cqg ccg eta gga acc 
ct g ccc go at. „ «J « « « S J« '« JS Pre « «Y ~ 
Leu Pro Gly He G±y i^eu vet 475 

465 

n-t- caa ate age atg caa age gtc 
ccg att gac cac age teg ata ccg ttt g ^ ^ ^ ^ Val 

Pro He Asp His Ser Ser lie Pro * 490 

480 

gg c c, g att c, 9 ».t etc a, g tat ca g ac. go c ? a g c. »=a 9J= «. 
Gly Gin He Gin Asn Leu Lys lyr a sQ5 

495 

ctg ,. g c. g jee JJJ | K «; » "I £ £ 15 S 2 S 

Leu Lys Gin Ala Glu <*xv ^ ' 520 



2162 



2210 



2258 



2306 



510 515 



« oaa etc qcg gee get act cac gag caa gec 
caa tat gee eta cag cag gaa etc geg g^ g^ ^ ^ ^ Qln Ala 

Gin Tyr Ala Leu Gin Gin bxu 



2354 
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525 530 



535 540 



gaa age ttt cac caa acg ate gee acg gta aac gaa ctg cga gat agg 2402 
Glu Ser Phe His Gin Thr He Ala Thr Val Asn Glu Leu Arg Asp Arg 

550 555 



545 



ate gee aat ttc gac gat ttc ttc agg ccg att cgt agt tac ttt tac 
He Ala Asn Phe Asp Asp Phe Phe Arg Pro He Arg Ser Tyr Phe Tyr 

565 570 



560 



575 



590 



605 610 



615 620 



gtg gcg ctg eta cca gac gag ate" gee age cag cag ate aat egg gaa 
Val Ala Leu Leu Pro Asp Glu He Ala Ser Gin Gin He Asn Arg Glu 



625 



630 635 



640 



655 



670 



2450 



tgg gaa aag cac tgc tac gat ate ccg age tgc tgg gcg ctg aga tec 2498 
Trp Glu Lys His Cys Tyr Asp He Pro Ser Cys Trp Ala Leu Arg Ser 

580 585 



gtc ttt gac acg ate gac ggt ate gac caa etc ggc gag cag ctg gee 2546 
Val Phe Asp Thr He Asp Gly He Asp Gin Leu Gly Glu Gin Leu Ala 

595 600 



age gtg acc gta ace ttg gac aag ttg get gcg ate cag cct caa ttg 2594 
Ser Val Thr Val Thr Leu Asp Lys Leu Ala Ala He Gin Pro Gin Leu 



720 



725 730 



735 



gee gga ata gee gcg ctg age ttg att ttg etc ate atg atg ate att 
Ala Gly He Ala Ala Leu Ser Leu He Leu Leu He Met Met He He 
765 770 775 780 



2642 



ctg gcg ctg get aac tac gee acc atg tec ggg ate tat gee cag acg 2690 
Leu Ala Leu Ala Asn Tyr Ala Thr Met Ser Gly He Tyr Ala Gin Thr . 

645 650 



gcg gee ttg ate gaa aac get gee gee atg gga caa gee ttt gac gee 273 8 
Ala Ala Leu He Glu Asn Ala Ala Ala Met Gly Gin Ala Phe Asp Ala 

660 665 



gee' aag aac gac gac tec ttc tat ctg ccg ccg gag get ttt gac aac 2786 
Ala Lys Asn Asp Asp Ser Phe Tyr Leu Pro Pro Glu Ala Phe Asp Asn 

675 680 



cca gat ttc cag cgc ggc ctg aaa ttg ttc ctg teg gca gac ggt aag 2 834 
Pro Asp Phe Gin Arg Gly Leu Lys Leu Phe Leu Ser Ala Asp Gly Lys 

695 700 



gcg get egg atg ate ate tec cat gaa ggc gat ccc gee acc ccc gaa 2882 
Ala Ala Arg Met He He Ser His Glu Gly Asp Pro Ala Thr Pro Glu 

705 . 710 715 

ggc att teg cat ate gac gcg ate aag cag gcg gee. cac gag gee gtg 2930 
Gly He Ser His He Asp Ala He Lys Gin Ala Ala His Glu Ala Val 



aag ggc act ccc atg gcg ggt get ggg ate tat ctg gee ggc acg gee 2 97 8 
Lys Gly Thr Pro Met Ala Gly Ala Gly He Tyr Leu Ala Gly Thr Ala 

740 745 



gee acc ttc aag gac att caa gac ggc gee acc tac gac etc ctg ate 3026 
Kla Thr Phe Lys Asp He Gin Asp Gly Ala Thr Tyr Asp Leu Leu He 
750 755 760 



3074 
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acc c g , jgc ct 9 f t „ go9 J* 9g - Jg * «■ «g £| £ 
Thr Arg Ser Leu Val Ala Aia ueu v 7g5 



3122 



785 790 



- t-r.*- t-ft aac eta tec gtg ctg gtg tgg cag cat ctt 
teg ttg ggc get tct ttt ggc ctg g g ^ ^ Qln His Leu 

Ser Leu Gly Ala Ser Pne ^j-y u= 810 



3170 



800 805 



etc ggt ate cag ttg tac tgg ate gtg etc gcg ctg gee gtc ate ctg 
Leu Gly He Gin Leu Tyr Trp He Val Leu Aia ne ^ 



3218 



815 820 



aac ttq ctg ctg att tec cga ttc 
etc ctq qcc gtg gga teg gac tat aac ttg l. a » 
Leu Leu Ala Val Sly Ser Asp Tyr Asn Leu Leu Leu lie Ser Arg 



3266 



830 835 



«- +Tr, aac acc qqc ate ate cgt gcg atg 

aag gag gag ate ggt gea ggt ttg aac acc gg ^ ^ 

Lys Glu Glu He Gly Ala Gly Leu Asn Thr &xy 86Q 
845 850 

*-„ oho acc act acc ggc ctg gtg ttc gee gec 

gec ggc acc ggc ggg gtg gtg acc get gc gg a ?fae A] _ a ^ 

Ala Gly Thr Gly Gly Val val Thr Ala Ala e±y ^ 

8 to 5 



3314 



3362 



act .« tct tc 9 ttc f? ttc , f g .t «J « f c etc gt ej. ate »» 
Thr Met Ser Ser Phe Val Phe Ser Asp i,eu g ggQ 



880 885 



ggg acc acc att ggt ctt ggg ctg ctg ttc gac aeg ctg gtg gtg cgc 

, -r-i ^ riw t pni Glv Leu Leu Pne Asp j-j-ij- 

Gly Thr Thr He Gly Leu ^J.y 9Q5 



3458 



895 900 



^r- ,i- c acq qtg ctg etc ggg cgc tgg ttc tgg 
gcg ttc atg acc ccg tec ate gcg gtg g ^ phe Trp 

Ala Phe Met Thr Pro Ser lie Ala vai u 



3506 



915 920 
910 yx < 



nr-n aac aaa atg ctt egg ccg 
tgg ccg caa cga gtg cgc ccg cgc gec age agg ^g ^ ^ 

Trp Pro Gin Arg Val Arg pio A^y " 940 
925 930 

^ ( fta eta ctg cgc gag ggc aac 

tac ggc ccg egg ece gtg gtt cgt gaa ttg ctg g ^ ^ ^ ^ 

Tyr Gly Pro Arg Pro Val Val Arg o_lu ^ 



3554 



3602 



aaa act caa qtg get acc cac cgt taa ggtggtggga 
gat gac ccg aga act cag gtg y 
Lp Asp Pro Arg Thr Gin Val Ala Thr Em Arg 



3648 



960 



t g cc 9 ctttc . gggg .at,t „c W . c 9 =tc g act g 9 tc 9 c 9 c g a g c,a g cc g .=. 370S 
c 9 tatat g aa g tcc g3 c 9g . .cc 9 , g99 ca c.ca g =t g c, gg9 a.. 9 =c g g tcatcct g c 3,SS 
tc,cc,cc 9 t c ggg9 c g a. g acc 99 =..ac tcc 9 t„ g ac ccc g ct g at g = g c g tc 9 a g c 3S.S 
,c g .c g9 =c, g t,c 9 c g ,tc g tc 9 cctc 9 c t 999 t 9gg9 c 9 =c g aaaa,t cc g9 tct 99 t 3SSS 
ac=ac.ac 9 t c 9 t 9 a. g ..c cc,c 9g9 tc 9 . g =t 9 =. gg a c 9g c,cc g9 , cc 9g c g ,cta 3S.S 



3953 



cgacg 
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<210> 2 
<211> 967 
<212> PRT 

<213> Mycobacterium tuberculosis strain 74 ("ancestral" strain) 
<220> 

<223> mmpLb protein 
<400> 2 

Met Ser Asn His His Arg Pro Arg Pro Trp Leu Pro His Thr lie Arg 
1 5 10 -15 

Arg Leu Ser Leu Pro lie Leu Leu Phe Trp Val Gly Val Ala Ala lie 

20 25 30 

Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly Glu Ala His Asn 

35 40 45 

Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin Ala Met Lys Arg 
50 55 60 



lie Gly Lys Val. Phe His Glu Phe Asp Ser Asp Ser Ala Ala Met lie 

65 70 75 80 

Val Leu Glu Gly Asp Lys Pro Leu Gly Asn Asp Ala His Arg Phe Tyr 

85 90 95 



Asp Thr Leu Leu 

100 

Val Gin Asp Phe 
115 

Thr Asp Gly Lys 
130 

Gly Glu Ala Leu 
145 

Ala His Thr Pro 



Ala Pro Leu Met 

180 

Lys Val Thr Gly 
195 

Val Tyr Arg Ser 
210 



Arg Asn Leu Ser 



Trp Gly Asp Pro 

12 0 

Ala Ala Tyr Val 
135 

Ser lie Glu Ser 
150 

Pro Pro Ala Gly 
165 

Ala Asp Gin Phe 



lie Thr Leu Val 

200 

Val Val Thr Met 
215 



Asn Asp Thr Lys 
105 

Leu Thr Ala Ala 



Gin Val Tyr Leu 

140 

Val Asp Ala Val 
155 

Val Lys Ala Tyr 
170 

Gin Val Gly Ser 
185 

Val He Ala Val 



Val Leu Val Leu 

220 



His Val Glu His 
110 

Gly Ser Gin Ser 
125 

Ala Gly Asn Gin 



Arg Asp He Val 

160 

Val Thr Gly Ala 
175 

Lys Gly Thr Ala 
190 

Met Leu Leu Phe 
205 

He Thr Val Leu 



He Glu Leu Ala Ala Ala Arg Gly 
225 230 

Gly Val He Gly . Leu Ser Thr Tyr 

245 

Val He Ala Ala Gly Thr Asp Tyr 

260 



He Val Ala Phe Leu Gly Asn Ala 
235 240 

Ser Thr Asn Leu Leu Thr Leu Leu 
250 255 

Ala He Phe Val Leu Gly Arg Tyr 
265 270 
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His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr Ala Phe Tyr Thr 
275 280 285 

Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser Gly Leu Thr Val 

290 ~ y => 
Ala Gly Ala Val Tyr Cys Leu Ser Phe Thr Arg Leu Pro Tyr Phe Gin 
305 310 

Ser Leu Gly He Pro Ala Ser He Gly Val Met lie Ala Leu Ala Ala 

325 330 

Ala Leu Ser Leu Ala Pro Ser Val Leu He Leu Gly Ser Arg Phe Gly 

340 345 
Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly Trp Arg Arg He 



355 



360 



Gly Thr Ala He Val Arg Trp Pro Gly Pro He Leu Ala Val Ala Cys 



370 375 



Ala He Ala Val Val Gly Leu Leu Ala Leu Pro Gly Tyr Lys Thr Ser 
385 390 395 

Tyr Asp Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro Ala Asn He Gly 



405 410 



Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg Leu Asn Pro Glu 



420 4 25 



Le u Leu Met He Glu Thr Asp His Asp Met Arg Asn Pro Ala Asp Met 

435 440 445 

Leu He Leu Asp Arg He Ala Lys Ala Val Phe His Leu Pro Gly He 



460 

450 455 



Gly L eu Val Gin Ala Met Thr Arg Pro Leu Gly Thr Pro He Asp His 
465 4 ™ 475 

Ser ser He Pro Phe Gin He Ser Met Gin Ser Val Gly Gin lie Gin 

485 490 

i 

Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu Leu Gin Ala 

500 505 
Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg Gin Tyr Ala Leu 



515 



520 



h t tv-i= ai* Ala Thr His Glu Gin Ala Glu Ser Phe His 
Gin Gin Glu Leu Ala Ala Ala mr *± 

530 535 
Gin Thr lie Ala Thr Val Asn Glu Leu Arg Asp Arg He Ala Asn Phe 
545 550 555 

Asp Asp Phe Phe Arg Pro He Arg Ser Tyr Phe Tyr Trp Glu Lys His 

565 570 

•n r> m Q P r Cvs Trp Ala Leu Arg Ser Val Phe Asp Thr 
Cys Tyr Asp He Pro Ser Cys irp *± a ^ 

580 58b 
He Asp Gly He Asp Gin Leu Gly Glu Gin Leu Ala Ser Val Thr Val 
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595 600 605 

Thr Leu Asp Lys Leu Ala Ala lie Gin Pro Gin Leu Val Ala Leu Leu 
610 615 620 

Pro Asp Glu lie Ala Ser Gin Gin lie Asn Arg Glu Leu Ala Leu Ala 
625 630 635 640 

Asn Tyr Ala Thr Met Ser Gly lie Tyr Ala Gin Thr Ala Ala Leu lie 

645 650 655 

Glu Asn Ala Ala Ala Met Gly Gin Ala Phe Asp Ala Ala Lys Asn Asp 

660 665 670 

Asp Ser Phe Tyr Leu Pro Pro Glu Ala Phe Asp Asn Pro Asp Phe Gin 
675 680 685 

Arg Gly Leu Lys Leu Phe Leu Ser Ala Asp Gly Lys Ala Ala Arg Met 
690 695 700 

lie lie Ser His Glu Gly Asp Pro Ala Thr Pro Glu Gly lie Ser His 
705 710 715 720 

He Asp Ala He Lys Gin Ala Ala His Glu Ala Val Lys Gly Thr Pro 

725 730 735 

Met Ala Gly Ala Gly He Tyr Leu Ala Gly Thr Ala Ala Thr Phe Lys 

740 745 750 

Asp He Gin Asp Gly Ala Thr Tyr Asp Leu Leu He Ala Gly He Ala 
755 760 765 

Ala Leu Ser Leu He Leu Leu He Met Met He He Thr Arg Ser Leu 
770 775 780 

Val Ala Ala Leu Val He Val Gly Thr Val Ala Leu Ser Leu Gly Ala 
785 790 ' 795 800 

Ser Phe Gly Leu Ser Val Leu Val Trp Gin His Leu Leu Gly He Gin 

805 810 815 

Leu Tyr Trp He Val Leu Ala Leu Ala Val He Leu Leu Leu Ala Val 

820 825 - 830 

Gly Ser Asp Tyr Asn Leu Leu Leu He Ser Arg Phe Lys Glu Glu He 
835 840 845 

Gly Ala Gly Leu Asn Thr- Gly He He Arg Ala Met Ala Gly Thr Gly 
850 855 860 

Gly Val Val Thr Ala Ala Gly Leu Val Phe Ala Ala Thr Met Ser Ser 
865 870 875 880 

Phe Val Phe Ser Asp Leu Arg Val Leu Gly Gin He Gly Thr Thr He 

885 890 895 

Gly Leu Gly Leu Leu Phe Asp Thr Leu Val Val Arg Ala Phe Met Thr 

900 905 910 

Pro Ser He Ala Val Leu Leu Gly Arg Trp Phe Trp Trp Pro Gin Arg 
915 920 925 
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Val Arg Pro Arg Pro Ala Ser Arg Met Leu Arg Pro Tyr Gly Pro Arg 



930 "5 940 



Pro Val Val Arg Glu Leu Leu Leu Arg Glu Gly Asn Asp Asp Pro Arg 
945 950 955 

Thr Gin Val Ala Thr His Arg 

965 



<210> 3 
<211> 148 
<212> PRT 



^213> Mycobacterium tuberculosis strain 74 ("ancestral" strain) 



<220> 

<223> mmpS6 protein 



Val^ln Gly lie Ser Val Thr Gly Leu Val Lys Arg Gly Trp Met Val 



1 



Leu 



Val Ala Val Ala Val Val Ala Val Ala Gly Phe Ser Val Tyr Arg 

25 30 



20 



Leu His Gly lie Phe Gly Ser His Asp Thr Thr Ser Thr Ala Gly Gly 

35 40 45 

t-i t„= -d™ p>i<= Asn Pro Lys Gin Val Thr Leu Glu 

Val Ala Asn Asp He Lys Pro Pne Asn 

50 55 60 



Val Phe Gly Ala Pro Gly Thr Val Ala Thr lie Asn Tyr Leu Asp Val 
65 70 75 

Asp Ala Thr Pro Arg Gin Val Leu Asp Thr Thr Leu Pro Trp Ser Tyr 

85 90 

Thr He Thr Thr Thr Leu Pro Ala Val Phe Ala Asn Val Val Ala Gin 

100 ±u:d 

Gly Asp Ser Asn Ser He Gly Cys Arg He Thr Val Asn Gly Val Val 



115 



120 



Lys Asp Glu Arg He Val Asn Glu Val Arg Ala Tyr Thr Phe Cys Leu 



130 135 



Asp Lys Ser Ser 
145 



<210> 4 
<211> 2153 

lllll Sycobacterium tuberculosis strain 74 ("ancestral" strain) 

Ss> Sequence specifically deleted in "modem" strains of 
Mycobacterium tuberculosis 
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<400> 4 

ctggttgccg tggcggtggt ggcggtcgcg ggattcagcg tctatcggtt gcacggcatc 60 

ttcggctcgc acgacaccac ctcgaccgcc ggtggtgtcg cgaacgacat caagccgttc 12 0 

aaccccaaac aggtaaccct cgaggtcttt ggcgctcccg gaaccgtggc aacgatcaat 180 

tatctggacg tggatgccac acctcggcaa gtcctggaca cgaccctgcc gtggtcatac 240 

acgatcacga cgaccctgcc cgcggtcttc gccaatgttg tcgcgcaagg cgacagcaat 300 

tccatcggct gccgcatcac cgtcaacggt gtagtcaagg acgaaaggat cgtcaacgaa 3 60 

gtgcgcgcct ataccttctg cctcgacaag tcctcatgag caaccaccac cgcccgcggc 420 

cttggttgcc gcacaccatc cgacggcttt cgttgccgat cttgctgttt tgggtgggtg 480 

tggccgccat aaccaatgcc gccgtgccgc aattggaggt ggtcggggag gcgcataacg 54 0 

tcgcacagag ctccccggat gacccgtcgc tgcaggcgat gaaacgcatc ggcaaggtgt 600 

tccacgagtt cgattccgac agtgcggcca tgatcgtctt ggaaggcgat aagccgctcg 660 

gcaacgacgc ccaccggttc tacgacaccc tgctccgcaa cctttcaaac gacaccaaac 720 

acgtcgagca cgttcaggac ttctggggcg atccgctgac cgcggccggc tcgcaaagca 780 

ccgacggcaa agccgcctac gttcaggtct atctcgccgg caaccaaggc gaggcgttgt 840 

caatcgagtc cgtcgacgcg gtgcgcgaca tcgtcgccca tacgccacca ccggccgggg 900 

tcaaggccta cgtcaccggc gcggccccgc tcatggccga tcagtttcag gtgggcagca 960 

aaggaaccgc gaaagttacc gggataactc tggttgtgat cgcggtgatg ttgctcttcg 102 0 

tataccgttc cgtcgtcacc atggtcctgg tgcttatcac ggttcttatt gagttggccg 1080 

cggcccgcgg gatcgtcgct tttctcggaa acgccggggt aatcgggctg tcgacatact 114 0 

cgacgaatct gctcacacta ttggtaatcg cggcgggcac agactacgcg atttttgtcc 1200 

tcggccgcta tcacgaggcg cgctacgccg cacaggatcg ggaaacggcc ttctacacga 1260 

tgtatcgcgg gaccgcccac gtcgtcttgg gctcgggtct gaccgttgcc ggcgcggtgt 13 2 0 

attgcctgag ctttacccgg ctaccctatt ttcaaagcct gggtattccc gcctcgatag 13 80 

gggtgatgat tgcgttggca gccgcgctca gcctggcccc atccgtgctc atcttgggca 144 0 

gtcgtttcgg ttgtttcgaa cccaagcgca ggatgaggac caggggatgg cggcgcatcg 15 00 

gcacggccat cgtgcgttgg ccgggaccca tcctggcagt ggcgtgcgca attgcggtgg 1560 

tgggtctgct cgcgctgccg ggatacaaaa cgagctacga cgctcgctat tacatgcccg 162 0 

ccaccgcccc ggccaatatt ggctacatgg ccgcggagcg acattttccc caagcgcggc 1680 

tgaatcccga actactgatg atcgagacgg atcacgatat gcgcaatccg gccgacatgc 1740 

tcatcttgga taggatcgcc aaggctgtct tccatctgcc cggcataggg ctggtgcagg 1800 

ccatgacccg gccgctagga accccgattg accacagctc gataccgttt cagatcagca 1860 

tgcaaagcgt cggccagatt cagaatctca agtatcagag ggaccgagca gccgacttgc 192 0 

tgaagcaggc cgaagagctg gggaagacga tcgaaatctt gcagcgccaa tatgccctac 1980 

agcaggaact cgcggccgct ' actcacgagc aagccgaaag ctttcaccaa acgatcgcca 2040 

cggtaaacga actgcgagat aggatcgcca atttcgacga tttcttcagg ccgattcgta 2100 

gttactttta ctgggaaaag cactgctacg atatcccgag ctgctgggcg ctg 2153 

<210> 5 
<211> 2904 
<212> DNA 

«s213> Mycobacterium complex 

<220> 

<223> mmpL6 coding sequence and protein 

<220> 

<221> CDS 

<222> (1) . . (2901) 

<400> 5 

atg age aac cac cac cgc ccg egg cct tgg ttg ccg cac acc ate cga 48 
Met Ser Asn His His Arg Pro Arg Pro Trp Leu Pro His Thr lie Arg 
15 10 15 

egg ctt teg ttg ccg ate ttg ctg ttt tgg gtg ggt gtg gee gee ata 96 
Arg Leu Ser Leu Pro lie Leu Leu Phe Trp Val Gly Val Ala Ala lie 

20 25 30 

acc aat gee gec gtg ccg caa ttg gag gtg gtc ggg gag gcg cat aac 144 
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\7al val Glv Glu Ala His Asn 
Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly 



35 

gtc 9 ca cag age tec ccg gat gae eeg teg ctg eag jog atg aaa ege 
Val Ala Gin Ser Ser Pro Asp Asp Pro *er ^ 



192 



50 55 



ate gge aag gtg tte eae gag tte gat tee gae agt geg gee atg go 
He Gly Lys Val Phe His Glu Phe Asp Ser Asp s> &q 



240 



J2 2 E IS K 5 S £ E = K K = 3 S2 S 



288 



85 90 



r^-f i-ra aac qac acc aaa cac gtc gag cac 
gae aee etg etc cgc aac ett tea aac g ^ ^ ^ ^ s 

Asp Thr Leu Leu Arg Asn Leu ser Asm j 



336 



100 105 



gtt eag gae tte tgg gge gat eeg etg aee geg gee gge teg eaa age 
Val Gin Asp Phe Trp Gly Asp Pro Leu Thr Ala Ala Gly Ser Gin Ser 



384 



115 



hae att caq gtc tat etc gec gge aac caa 
acc gae gge aaa gee gec tac gtt eag g ^ ^ ^ 

Thr Asp Gly Lys Ala Ala Tyi Val bin vax y ^ 
130 135 

gg = 9 , 3 9 = 9 t„ t„ «c gj tec J. =. B«c „ ? t ? go jac g. »w 
Gly Glu Ala Leu Ser He Glu Sex vai ^ ^ 16Q 
145 150 

„„ _ tG aaa acc tac gtc acc gge geg 
gec cat acg eca eca ccg gee ggg gtc aag g a ^ 

Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala y ^ 



432 



480 



528 



165 170 



- „Jm caa atq qgc age aaa gga acc geg 
gcc eeg etc atg gee gat eag ttt eag gtg gg ^ ^ ^ ^ ^ 

Ala Pro Leu Met Ala Asp i*xn *-a±« 190 



576 



180 185 



aaa gtt aee ggg ata act etg gtt gtg ate geg gtg atg ttg etc tte 
Lys Val Thr Gly He Thr Leu Val Val He Ala v ^ 



624 



195 200 



gta tae egt tee gtc «tc aee atg gtc etg gtg ett ate acg gtt ett 
Val Tyr Arg Ser Val Val Thr Met Val Leu vai 



672 



210 215 



att gag ttg gee geg gee ege ggg ate gtc get ttt etc gga aac gec 
He Glu Leu Ala Ala Ala Arg Gly He Val Ala Pne u ^ 
225 230 

4-,-,,-t ai-er aat CtQ CtC aca Cta ttg 

ggg gta ate ggg ctg teg aca tac teg aeg aat c^g ^ ^ ^ ^ 
Gly Val He Gly Leu Ser Thr Tyr Ser inr a ^ 

245 250 



720 



768 



g ta ,tc g „ S = g | f «a go t,= g «* ttt f c etc gf c = g e tat 
Val He Ala Ala Gly Thr Asp Tyr Ala lie * ^ 



816 



260 265 



eae gag geg ege tac gee gea eag gat egg gaa acg gec tte tac acg 



864 
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His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr Ala Phe Tyr Thr 
275 280 285 

atg tat cgc ggg acc gcc cac gtc gtc ttg ggc teg ggt ctg acc gtt 
Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser Gly Leu Thr Val 
;90 " 295 300 



912 



960 



1008 



1056 



1104 



1152 



1200 



1248 



gcc ggc gcg gtg tat tgc ctg age ttt acc egg eta ccc tat ttt caa 
Ala Gly Ala Val Tyr Cys Leu Ser Phe Thr Arg Leu Pro Tyr Phe Gin 
305 310 315 320 

age ctg ggt att ccc gcc teg ata ggg gtg atg att gcg ttg gca gcc 
Ser Leu Gly lie Pro Ala Ser He Gly Val Met He Ala Leu Ala Ala 

325 330 335 

gcg etc age ctg gcc cca tec gtg etc ate ttg ggc agt cgt ttc ggt 
Ala Leu Ser Leu Ala Pro Ser Val Leu He Leu Gly Ser Arg Phe Gly 

340 345 350 

tgt ttc gaa ccc aag cgc agg atg agg acc agg gga tgg egg cgc ate 
Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly Trp Arg Arg He 
355 360 365 

ggc acg gcc ate gtg cgt tgg ccg gga ccc ate ctg gca gtg gcg tgc 
Gly Thr Ala He Val Arg Trp Pro Gly Pro He Leu Ala Val Ala Cys 
370 375 380 

gca att gcg gtg gtg ggt ctg etc gcg ctg ccg gga tac aaa acg age 
Ala He Ala Val Val Gly Leu Leu Ala Leu Pro Gly Tyr Lys Thr Ser 
385 390 395 400 

tac gac get cgc tat tac atg ccc gcc acc gcc ccg gcc aat att ggc 
Tyr Asp Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro Ala Asn He Gly 

405 410 415 

tac atg gcc gcg gag cga cat 'ttt ccc caa gcg egg ctg aat ccc gaa 1296 
Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg Leu. Asn Pro Glu 

420 425 430 

eta ctg atg ate gag acg gat cac gat atg cgc aat ccg gcc gac atg 
Leu Leu Met He Glu Thr Asp His Asp Met Arg Asn Pro Ala Asp Met 
435 440 445 

etc ate ttg gat agg ate gcc aag get gtc ttc cat ctg ccc ggc ata 
Leu He Leu Asp Arg He Ala Lys Ala Val Phe His Leu Pro Gly He 
450 455 460 

ggg ctg gtg cag gcc atg acc egg ccg eta gga acc ccg att gac cac 
Gly Leu Val Gin Ala Met Thr Arg Pro Leu Gly Thr Pro He Asp His 
465 470 475 480 

age teg ata ccg ttt cag ate age atg caa age gtc ggc cag att cag 
Ser Ser He Pro Phe Gin He Ser Met Gin Ser Val Gly Gin He Gin 

485 490 495 

aat etc aag tat cag agg gac cga gca gcc gac ttg ctg aag cag gcc 
Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu Leu Lys Gin Ala 

500 505 510 

gaa gag ctg ggg aag acg ate gaa. ate ttg cag cgc caa tat gcc eta 



1344 



1392 



144 0 



1488 



1536 



1584 
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Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg Gin Tyr Ala Leu 

_ _ - 525 



515 



520 



raa caa aaa etc qcq gee get act cac gag caa gec gaa age ttt cac 1632 
Gin Gin IS Leu SS Ala La Thr His Glu Gin Ala Glu Ser Phe His 
enc 540 



530 535 



caa aeg ate gee aeg gta aag gaa etg ega gat agg ate gee aat tte 1680 
Gin Thr lie Ala Thr Val Lys Glu Leu Arg Asp Arg He Ala Asn Phe 
545 550 5 55 

gae gat tte tte agg ceg att egt agt tae ttt' tac tgg gaa aag cac 
Asp Asp Phe Phe Arg Pro He Arg Ser Tyr Phe Tyr Trp Glu Lys His 



1728 



565 



tge tae gat ate eeg age tge tgg gcg etg aga tec gtc ttt gae aeg 1776 
Cys Tyr Asp He Pro Ser Cys Trp Ala Leu Arg Ser Val Phe Asp Thr 



580 



585 



ate gae ggt ate gae caa etc gge gag eag etg gee age gtg aec gta 
lie Asp Giy He Asp Gin Leu Gly Glu Gin Leu Ala Ser Val Thr Val 

_ _1 * <r n n 605 



1824 



595 600 



ace ttg gae aag ttg get gcg ate eag eet caa ttg gtg gcg etg eta 
Thr Leu Asp Lys Leu Ala Ala He Gin Pro usu 

" enc 620 



1872 



610 615 



nnn a3C aaa ate qcc age eag cag ate aat egg gaa etg gcg etg get 
Pro Asp IS fie Ala Ser Gin Gin He Asn Arg Glu Leu Ala Leu Ala 
625 630 635 

aae tac gec ace atg tee ggg ate tat gec cag aeg gcg gec ttg ate 
Asn Tyr Ala Thr Met Ser Gly He Tyr Ala Gin Thr Ala Ala Leu He 

645 650 

gaa aac get: gec gee atg gga ,caa gee ttt gae gee gee aag aae gae 
Glu Asn Ala Ala Ala Met Gly Gin Ala Phe Asp Ala Ala Lys Asn Asp 

660 663 670 



1920 



1968 



2016 



aae tec tte tat etg ecg eeg gag get ttt gae aac cea gat tte cag 
Sp Ser Phe Tyr Leu Pro Pro Glu Ala Phe Asp Asn Pro Asp Phe Gin 
675 680 685 

~„ tra tte eta teg gca gae ggt aag gcg get egg atg 

cgc ggc.ctg aaa ttg tte etg t g y a * Met 

Arg Gly Leu Lys Leu Phe Leu Ser A±a ^sp 



2064 



2112 



690 



695 



ate ate tec cat gaa ggc gat ccc gee aec ccc gaa ggc att teg cat 2160 
He lie Ser His Glu Gly Asp Pro Ala Thr Pro Glu Gly He Ser His 
705 710 



==»rr r^a aca qcc cac gag gee gtg aag ggc act ccc 2208 

ate gae gcg ate aag cag gcg gee g y a ^ 

He Asp Ala He Lys Gin Ala Ala Hxs Glu Aia vai y y 

725 730 

4- at-r tat ctq gee ggc aeg gee gee ace tte aag 2256 

ss. k ss sr. ;s £ «s l. s» Tte «. «. «- ^ 

740 745 

~ arc tac qac etc etg ate gec gga ata gee 23 04 

Sp ES 5" S=p S5 S S SS Lp «~ -S »• J1 « M - 
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755 760 765 

gcg ctg age ttg att ttg etc ate atg atg ate att ace cga age ctg 2352 
Ala Leu Ser Leu lie Leu Leu lie Met Met lie lie Thr Arg Ser Leu 
770 775 780 

gtt gcg gcg ctg gtg ate gtg ggc acg gtg gcg ctg teg ttg ggc get 2400 
Val Ala Ala Leu Val lie Val Gly Thr Val Ala Leu Ser Leu Gly Ala 
785 790 795 800 

tct ttt ggc ctg tec gtg ctg gtg tgg cag cat ctt etc ggt ate cag 2448 
Ser Phe Gly Leu Ser Val Leu Val Trp Gin His Leu Leu Gly lie Gin 

805 810 815 

ttg tac tgg ate gtg etc gcg ctg gee gtc ate ctg etc ctg gee gtg 2496 
Leu Tyr Trp lie Val Leu Ala Leu Ala Val lie Leu Leu Leu Ala Val 

820 825 830 

gga teg gac tat aac ttg ctg ctg att tec cga ttc aag gag gag ate 2544 
Gly Ser Asp Tyr Asn Leu Leu Leu lie Ser Arg Phe Lys Glu Glu lie 
835 840 845 

ggt gca ggt ttg aac ace ggc ate ate cgt gcg atg gec ggc ace ggc 2592 
Gly Ala Gly Leu Asn Thr Gly He He Arg Ala Met Ala Gly Thr Gly 
850 855 860 

999 gtg gtg ace get gee ggc ctg gtg ttc gee gee act atg tct teg 2 64 0 
Gly Val Val Thr Ala Ala Gly Leu Val Phe Ala Ala Thr Met Ser Ser 
865 870 875 880 

ttc gtg ttc agt gat ttg egg gtc etc ggt cag ate ggg acc acc att 2 68 8 
Phe Val Phe Ser Asp Leu Arg Val Leu Gly Gin He Gly Thr Thr He 

885 890 895 

ggt ctt ggg ctg ctg ttc gac acg ctg gtg gtg cgc gcg ttc atg acc 2736 
Gly Leu Gly Leu Leu Phe Asp Thr Leu Val Val Arg Ala Phe Met Thr 

900 * 905 910 

ccg tec ate gcg gtg ctg etc ggg cgc tgg ttc tgg tgg ccg caa cga 2784 
Pro Ser He Ala Val Leu Leu Gly Arg Trp Phe Trp Trp Pro Gin Arg 
915 920 925 

gtg cgc ccg cgc cct gee age agg atg ctt egg ccg tac ggc ccg egg 2832 
Val Arg Pro Arg Pro Ala Ser Arg Met Leu Arg Pro Tyr Gly Pro Arg 
930 935 940 

ccc gtg gtt cgt gaa ttg ctg ctg cgc gag ggc aac gat gac ccg aga 2 8 80 
Pro Val Val Arg Glu Leu Leu Leu Arg Glu Gly Asn Asp Asp Pro Arg 
945 950 955 960 

act cag gtg get acc cac cgt taa 2 904 

Thr Gin Val Ala Thr His Arg 

965 



<210> 6 
<211> 967 
<212> PRT 

<213> Mycobacterium complex 
<220> 
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< 2 2 3 > mmpL6 protein 



Metier Asn His His Arg Pro Arg Pro Trp Leu Pro His Thr lie Arg 

10 J- 3 



1 



Arg Leu Ser Leu Pro He Leu Leu Phe Trp Val Gly Val Ala Ala He 

20 25 30 

Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly Glu Ala His Asn 

35 40 45 

Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin Ala Met Lys Arg 
50 55 60 

lie Gly Lys Val Phe His Glu Phe Asp Ser Asp Ser Ala Ala Met lie 

n n 7 5 



val Leu Glu Gly Asp Lys Pro Leu Gly Asn Asp Ala His Arg Phe Tyr 



85 ^O 



Asp Thr Leu Leu Arg Asn Leu Ser Asn Asp Thr Lys His Val Glu His 

100 ±v * 

Val Gin Asp Phe Trp Gly Asp Pro Leu Thr Ala Ala Gly Ser Gin Ser 
115 120 125 

Thr Asp Gly Lys Ala Ala Tyr Val Gin Val Tyr Leu Ala Gly Asn Gin 

135 140 



130 



Gly Glu Ala Leu Ser He Glu Ser Val Asp Ala Val Arg Asp He Val 
145 I 50 155 

Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala Tyr Val Thr Gly Ala 

165 170 1 

Ala Pro Leu Met Ala Asp Gin Phe Gin Val Gly Ser Lys Gly Thr Ala 

180 185 
Lys Val Thr Gly He Thr Leu Val Val He Ala Val Met Leu Leu Phe 



195 200 



Val Tyr Arg Ser Val Val Thr Met Val Leu Val Leu He Thr Val Leu 

_ J _ one 220 



210 215 



He Glu Leu Ala Ala Ala Arg Gly He Val Ala Phe Leu Gly Asn Ala 
225 230 235 

Gly Val He Gly Leu Ser Thr Tyr Ser Thr Asn Leu Leu Thr Leu Leu 



245 



Val He Ala Ala Gly Thr Asp Tyr Ala He Phe Val Leu Gly Arg Tyr 



260 265 



His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr Ala Phe Tyr Thr 

275 280 
Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser Gly Leu Thr Val 



290 295 
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Ala Gly Ala Val Tyr Cys Leu Ser Phe Thr Arg Leu Pro Tyr Phe Gin 
305 310 315 320 

Ser Leu Gly lie Pro Ala Ser lie Gly Val Met lie Ala Leu Ala Ala 

325 330 335 

Ala Leu Ser Leu Ala Pro Ser Val Leu lie Leu Gly Ser Arg Phe Gly 

340 345 350 

Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly Trp Arg Arg lie 
355 360 365 

Gly Thr Ala He Val Arg Trp Pro Gly Pro He Leu Ala Val Ala Cys 
370 375 380 

Ala He Ala Val Val Gly Leu Leu Ala Leu Pro Gly Tyr Lys Thr Ser 
385 390 395 400 

Tyr Asp Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro Ala Asn He Gly 

405 410 415 

Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg Leu Asn Pro Glu 

420 425 430 

leu Leu Met He Glu Thr Asp His Asp Met Arg Asn Pro Ala Asp Met 
435 440 445 

Leu He Leu Asp Arg He Ala Lys Ala Val Phe His Leu Pro Gly He 
450 455 460 

Gly Leu Val Gin Ala Met Thr Arg Pro Leu Gly Thr Pro He Asp His 

/,nn 475 480 

465 470 

Ser Ser He Pro Phe Gin He Ser Met Gin Ser Val Gly Gin He Gin 

485 490 495 

Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu Leu Lys Gin Ala 

500 505 _ 510 

Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg Gin Tyr Ala Leu 
515 520 525 

Gin Gin Glu Leu Ala Ala Ala Thr His Glu Gin Ala Glu Ser Phe His • 
530 535 540 

Gin Thr He Ala Thr Val Lys Glu Leu Arg Asp Arg He Ala Asn Phe 
545 550 555 560 

Asp Asp Phe Phe Arg Pro He Arg Ser Tyr Phe Tyr Trp Glu Lys His 

565 570 575 

Cys Tyr Asp He Pro Ser Cys Trp Ala Leu Arg Ser Val Phe Asp Thr 

580 585 590 

He Asp Gly He Asp Gin Leu Gly Glu Gin Leu Ala Ser Val Thr Val 
595 600 605 

Thr Leu Asp Lys Leu Ala Ala He Gin Pro Gin Leu Val Ala Leu Leu 
610 615 620 

Pro Asp Glu He Ala Ser Gin Gin He Asn Arg Glu Leu Ala Leu Ala 
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" 5 640 

625 630 

n mx. xw, 4- c~v- mv Tie Tvr Ala Gin Thr Ala Ala Leu He 
Asn Tyr Ala Thr Met Sex Gly lie ryi * ^ 

645 tobU 

Gin Asn Ala Ala Ala Met Gly Gin Ala Phe Asp Ala Ala Lys Asn Asp 

660 665 
Asp Ser Phe Tyr Leu Pre Pro Glu Ala Phe Asp Asn Pro Asp Phe Gin 



675 



680 



_ -p _ . . c „ ai a txs-d Glv Lys Ala Ala Arg Met 
Arg Gly Leu Lys Leu Phe Leu Sei Ala Asp ^xy ^ 

690 695 
lie lie Ser His Glu Gly Asp Pro Ala Thr Pro Glu Gly He Ser Hi. 
705 710 715 

Xle Asp Ala He Lys Gin Ala Ala His Glu Ala Val Lys Gly Thr Pro 

725 73U 
Met Ala Gly Ala Gly He Tyr Leu Ala Gly Thr Ala Ala Thr Phe Lys 



740 745 



le Ala Gly He Ala 



Asp He Gin Asp Gly Ala Thr Tyr Asp Leu Leu He A^ 
755 760 

Tlp T „ Leu He Met Met He He Thr Arg Ser Leu 
Ala Leu Ser Leu He Leu Leu xa-e n<=u 

7 7 0 775 

. i tip v*l Glv Thr Val Ala Leu Ser Leu Gly Ala 

Val Ala Ala Leu Val He Val Gly mr vd goQ 

785 790 

w»i -r*m val Tro Gin His Leu Leu Gly He Gin 
Ser Phe Gly Leu Ser Val Leu Val Trp %*±n ^ 

805 81U 

_ \ „ ala val He Leu Leu Leu Ala Val 
Leu Tyr Trp He Val Leu Ala Leu Ala Val lie i, ^ 

820 b ^ 

Gly Ser Asp Tyr Asn Leu Leu Leu He Ser Arg Phe Lys Glu Glu He 
835 840 

Gly Ala Gly Leu Asn Thr Gly He He Arg Ala Met Ala Gly Thr Gly 

850 855 
Gly val Val Thr Ala Ala Gly Leu Val Phe Ala Ala Thr Met Ser Ser 
865 870 

Phe Val Phe Ser Asp Leu Arg Val Leu Gly Gin He Gly Thr Thr He 

885 890 

Gly Leu Gly Leu Leu Phe Asp Thr Leu Val Val Arg Ala She Met Thr 

900 905 
Pro Ser He Ala Val Leu Leu Gly Arg Trp Phe Trp Trp Pro Gin Arg 

915 920 
val Arg Pro Arg Pro Ala Ser Arg Met Leu Arg Pro Tyr Gly Pro Arg 
930 935 

Pro val val Arg Glu Leu Leu Leu Arg Glu Gly Asn Asp Asp Pro Arg 
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950 955 960 



Thr Gin Val Ala Thr His Arg 

965 



<210> 7 
<211> 1758 
<212> DNA 

<213> Mycobacterium complex 

<220> 

<221> CDS 

<222> (1) . . (1758) 

<220> 

<22 3> mmpL6 truncated coding sequence and protein 

<400> 7 AQ 
atg age aac cac cac cgc ccg egg cct tgg ttg ccg cac acc ate cga 4 8 
Met Ser Asn His His Arg Pro Arg Pro Trp Leu Pro His Thr He Arg 



1 5 



10 15 



egg ctt teg ttg ccg ate ttg ctg ttt tgg gtg ggt gtg gee gee ata 
Arq Leu Ser Leu Pro He Leu Leu Phe Trp Val Gly Val Ala Ala He 

25 30 



96 



20 



acc aat gee gee gtg ccg caa ttg gag gtg gtc ggg gag gcg cat aac 
Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly Glu Ala His Asn 

40 45 



144 



35 



gtc gca'cag age tec ccg gat gac ccg teg ctg cag gcg atg aaa cgc 
Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin Ala Met Lys Arg 
50 55 60 

ate ggc aag gtg ttc cac gag 'ttc gat tec gac agt gcg gee atg ate 
He Gly Lys Val Phe His Glu Phe Asp Ser Asp Ser Ala Ala Met He 
65 70 75 80 

gtc ttg gaa ggc gat aag ccg etc ggc aac gac gee cac egg ttc tac 
Val Leu Glu Gly Asp Lys Pro Leu Gly Asn Asp Ala His Arg Phe Tyr 

85 90 95 

gac acc ctg etc cgc aac ctt tea aac gac acc aaa cac gtc gag cac 
Asp Thr Leu Leu Arg Asn Leu Ser Asn Asp Thr Lys His Val Glu His 

100 105 . HO 

gtt cag gac ttc tgg ggc gat ccg ctg acc gcg gee ggc teg caa age 
Val Gin Asp Phe Trp Gly Asp Pro Leu Thr Ala Ala Gly Ser Gin Ser 
115 120 125 

acc gac ggc aaa gee gee tac gtt cag gtc tat etc gec ggc aac caa 
Thr Asp Gly Lys Ala Ala Tyr Val Gin Val Tyr Leu Ala Gly Asn Gin 
130 135 140 

ggc gag gcg ttg tea ate gag tec gtc gac gcg gtg cgc gac ate gtc 
Gly Glu Ala Leu Ser He Glu Ser Val Asp Ala Val Arg Asp He Val 
145 150 155 160 

gec cat acg cca cca ccg gee ggg gtc aag gee tac gtc acc ggc gcg 



192 



240 



288 



336 



384 



r 432 



480 



528 
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Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala Tyr Val Thr Gly Ala 

165 170 175 

gcc ccg etc atg gec gat cag ttt cag gtg ggc age aaa gga acc gcg 576 
Ala Pro Leu Met Ala Asp Gin Phe Gin Val Gly Ser Lys Gly Thr Ala 



180 



185 190 



225 

ggg 



305 



310 315 320 



624 



aaa gtt acc ggg ata act ctg gtt gtg ate gcg gtg atg ttg etc ttc 

Lys Val Thr Gly lie Thr Leu Val Val He Ala Val Met Leu Leu Phe 
195 200 205 

gta tac cgt tec gtc gtc acc atg gtc ctg gtg ctt ate acg gtt ctt 672 

Val Tyr Arg Ser Val Val Thr Met Val Leu Val Leu He Thr Val Leu 
210 215 220 

att gag ttg gcc gcg gcc cgc ggg ate gtc get ttt etc gga aac gcc 720 

He Glu Leu Ala Ala Ala Arg Gly He Val Ala Phe Leu Gly Asn Ala 

230 235 240 



gta ate ggg ctg teg aca tac teg acg aat ctg etc aca eta ttg 768 
Gly Val He Gly Leu Ser Thr Tyr Ser Thr Asn Leu Leu Thr Leu Leu 

245 250 255 



gta ate gcg gcg ggc aca gac tac gcg att ttt gtc etc ggc cgc tat 816 
Val He Ala Ala Gly Thr Asp Tyr Ala He Phe Val Leu Gly Arg Tyr 

260 265 270 

cac gag gcg cgc tac gcc gca cag gat egg gaa acg gcc ttc tac acg 864 
His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr Ala Phe Tyr Thr 
275 280 285 

atg tat cgc ggg acc gcc cac gtc gtc ttg ggc teg ggt ctg acc gtt 912 
Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser Gly Leu Thr Val 
290 295 300 

gcc ggc gcg gtg tat tgc ctg age ttt acc egg eta ccc tat ttt caa 960 
Ala Gly Ala Val Tyr Cys Leu' Ser Phe Thr Arg Leu Pro Tyr Phe Gin 



1008 



age ctg ggt att ccc gcc teg ata ggg gtg atg att gcg ttg gca gcc 
Ser Leu Gly He Pro Ala Ser He Gly Val Met He Ala Leu Ala Ala 

325 330 335 

gcg etc age ctg gcc cca tec gtg etc ate ttg ggc agt cgt ttc ggt 1056 
Ala Leu Ser Leu Ala Pro Ser Val Leu He Leu Gly Ser Arg Phe Gly 

340 345 350 

tgt ttc gaa ccc aag cgc agg atg agg acc agg gga tgg egg cgc ate 1104 
Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly Trp Arg Arg He 
355 360 365 

ggc acg gcc ate gtg cgt tgg ccg gga ccc ate ctg gca gtg gcg tgc 1152 
Gly Thr Ala He Val Arg Trp Pro Gly Pro He Leu Ala Val Ala Cys 
370 375 380 

gca att gcg gtg gtg ggt ctg etc gcg ctg ccg gga tac aaa acg age 1200 
Ala lie Ala Val Val Gly Leu Leu Ala Leu Pro Gly Tyr Lys Thr Ser 
385 390 395 400 

tac gac get cgc tat tac atg ccc gcc acc gcc ccg gcc aat att ggc 124 8 
Tyr Asp Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro Ala Asn He Gly 
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405 410 415 

tac atg gcc gcg gag cga cat ttt ccc caa gcg egg ctg aat cec gaa 1296 

Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg Leu Asn Pro Glu 

420 425 430 

eta ctg atg ate gag acg gat cac gat atg cgc aat ccg gcc gac atg 1344 

Leu Leu Met lie Glu Thr Asp His Asp Met Arg Asn Pro Ala Asp Met 

435 440 445 

etc ate ttg gat agg ate gcc aag get gtc ttc cat ctg ccc ggc ata 13 92 

Leu lie Leu Asp 'Arg lie Ala Lys Ala Val Phe His Leu Pro Gly lie 

450 455 460 

ggg ctg gtg cag gcc atg ace egg ccg eta gga acc ccg att gac cac 144 0 

Gly Leu Val Gin Ala Met Thr Arg Pro Leu Gly Thr Pro lie Asp His 

465 470 475 480 

age teg ata ccg ttt cag ate age atg caa age gtc ggc cag att cag 1488 

Ser Ser lie Pro Phe Gin lie Ser Met Gin Ser Val Gly Gin He Gin 

485 490 495 

aat etc aag tat cag agg gac cga gca gcc gac ttg ctg aag cag gcc 153 6 

Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu Leu Lys Gin Ala 

500 505 510 

gaa gag ctg ggg aag acg ate gaa ate ttg cag cgc caa tat gcc eta 15 84 

Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg Gin Tyr Ala Leu 

515 520 525 

cag. cag gaa etc gcg gcc get act cac gag caa gcc gaa age ttt cac 1632 

Gin Gin Glu Leu Ala Ala Ala Thr His Glu Gin Ala Glu Ser Phe His 

530 535 540 



caa acg ate gcc acg gta aag gaa ctg cga gat agg ate gcc aat ttc 168 0 

Gin Thr lie Ala Thr Val Lys Glu Leu Arg Asp Arg He Ala Asn Phe 

545 550 ' 555 560 

gac gat ttc ttc agg ccg att cgt agt tac ttt tac tgg gaa aag cac 172 8 

Asp Asp Phe Phe Arg Pro lie Arg Ser Tyr Phe Tyr Trp Glu Lys His 

565 570 575 

tgc tac gat ate ccg age tgc tgg gcg ctg 175 8 

Cys Tyr Asp He Pro Ser Cys Trp Ala Leu 

580 585 



<210> 8 
<211> 586 
<212> PRT 

<213> Mycobacterium complex 
<220> 

<223> mmpL6 truncated protein 



<400> 8 

Met Ser Asn His His Arg Pro Arg Pro Trp Leu Pro His Thr He Arg 
1 5 10 .15 

Arg Leu Ser Leu Pro He Leu Leu Phe Trp Val Gly Val Ala Ala He 

20 25 30 
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Thr Asn Ala Ala Val Pro Gin Leu Glu Val Val Gly Glu Ala His Asn 

35 40 45 

Val Ala Gin Ser Ser Pro Asp Asp Pro Ser Leu Gin Ala Met Lys Arg 
50 55 60 

He Glv Lys Val Phe His Glu Phe Asp Ser Asp Ser Ala Ala Met He 
65 70 75 80 

Val Leu Glu Gly Asp Lys Pro Leu Gly Asn Asp Ala His Arg Phe Tyr 

85 50 95 

Asp Thr Leu Leu Arg Asn Leu Ser Asn Asp Thr Lys His Val Glu His 

100 105 HO 

Val Gin Asp Phe Trp Gly Asp Pro Leu Thr Ala Ala Gly Ser Gin Ser 
115 120 125 

Thr Asp Gly Lys Ala Ala Tyr Val Gin Val Tyr Leu Ala Gly Asn Gin 
130 135 140 

Gly Glu Ala Leu Ser He Glu Ser Val Asp Ala Val Arg Asp He Val 
145 150 155 160 

Ala His Thr Pro Pro Pro Ala Gly Val Lys Ala Tyr Val Thr Gly Ala 

165 170 175 

Ala Pro Leu Met Ala Asp Gin' Phe Gin Val Gly Ser Lys Gly Thr Ala 

180 185 190 

Lys Val Thr Gly He Thr Leu Val Val He Ala Val Met Leu Leu Phe 
195 200 205 

Val Tyr Arg Ser Val Val Thr Met Val Leu Val Leu He Thr Val Leu 
210 215 220 

He Glu Leu Ala Ala Ala Arg Gly He Val Ala Phe Leu Gly Asn Ala 
225 230 235 240 

Gly Val He Gly Leu Ser Thr Tyr Ser Thr Asn Leu Leu Thr Leu Leu 

245 250 255 

Val He Ala Ala Gly Thr Asp Tyr Ala He Phe Val Leu Gly Arg Tyr 

260 265 270 

His Glu Ala Arg Tyr Ala Ala Gin Asp Arg Glu Thr Ala Phe Tyr Thr 
275 280 285 

Met Tyr Arg Gly Thr Ala His Val Val Leu Gly Ser Gly Leu Thr Val 
290 295 300 

Ala Gly Ala Val Tyr Cys Leu Ser Phe Thr Arg Leu Pro Tyr Phe Gin 
305 310 315 320 

Ser Leu Gly He Pro Ala Ser He Gly Val Met He Ala Leu Ala Ala 

' 325 330 335 

Ala Leu Ser Leu Ala Pro Ser Val Leu He Leu Gly Ser Arg Phe Gly 

340 345 350 
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Cys Phe Glu Pro Lys Arg Arg Met Arg Thr Arg Gly Trp Arg Arg lie 
355 360 365 

Gly Thr Ala lie Val Arg. Trp Pro Gly Pro lie Leu Ala Val Ala Cys 
370 375 380 

Ala He Ala Val Val Gly Leu Leu Ala Leu Pro Gly Tyr Lys Thr Ser 
385 390 395 400 

Tyr Asp Ala Arg Tyr Tyr Met Pro Ala Thr Ala Pro Ala Asn He Gly 

405 410 415 

Tyr Met Ala Ala Glu Arg His Phe Pro Gin Ala Arg Leu Asn Pro Glu 

420 425 430 

Leu Leu Met He Glu Thr Asp His Asp Met Arg Asn Pro Ala Asp Met 
435 440 445 

Leu He Leu Asp Arg He Ala Lys Ala Val Phe His Leu Pro Gly He 
450 455 460 

Gly Leu Val Gin Ala Met Thr Arg Pro Leu Gly Thr Pro He Asp His 
465 470 475 480 

Ser Ser He Pro Phe Gin He Ser Met Gin Ser Val Gly Gin He Gin 

485 490 495 

Asn Leu Lys Tyr Gin Arg Asp Arg Ala Ala Asp Leu Leu Lys Gin Ala 

500 505 510 

Glu Glu Leu Gly Lys Thr He Glu He Leu Gin Arg Gin Tyr Ala Leu 
515 520 525 

Gin Gin Glu Leu Ala Ala Ala Thr His Glu Gin Ala Glu Ser Phe His 
530 535 540 

Gin Thr He Ala Thr Val Lys 'Glu Leu Arg Asp Arg He Ala Asn Phe 
545 550 555 560 

Asp Asp Phe Phe Arg Pro He Arg Ser Tyr Phe Tyr Trp Glu Lys His 

565 570 575 

Cys Tyr Asp He Pro Ser Cys Trp Ala Leu 

580 585 



<210> 9 
<211> 447 
<212> DNA 

<213> Mycobacterium complex 

<220> 

<221> CDS 

<222> (1) . . (444) 

<220> 

<223> mmpS6 coding sequence and protein 
<400> 9 

gtg cag ggg att tea gtg act ggc ctg gtc aaa cgc ggc tgg atg gtg 4 8 
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Val Gin Gly lie Ser Val Thr Gly Leu Val Lys Arg Gly Trp Met Val 



1 5 



10 15 



ctg gtt gcc gtg gcg gtg gtg gcg gtc gcg gga ttc age gtc tat egg 
Leu Val Ala Val Ala Val Val Ala Val Ala Gly Phe Ser Val Tyr Arg 



20 



25 30 



ttg cac ggc ate ttc ggc teg cac gac acc acc teg ace gcc ggt ggt 
Leu His Gly lie Phe Gly Ser His Asp Thr Thr Ser Thr Ala Gly Gly 



35 



40 45 



gtc gcg aac gac ate aag ccg ttc aac ccc aaa cag gta acc etc gag 
Val Ala Asn Asp lie Lys Pro Phe Asn Pro Lys Gin Val Thr Leu Glu 



50 



55 60 



gtc ttt ggc get ccc gga acc gtg gca acg ate aat tat ctg gac gtg 
Val Phe Gly Ala Pro Gly Thr Val Ala Thr He Asn Tyr Leu Asp Val 



65 



70 75 80 



gat gcc aca cct egg caa gtc ctg gac acg acc ctg ccg tgg tea tac 
Asp Ala Thr Pro Arg Gin Val Leu Asp Thr Thr Leu Pro Trp Ser Tyr 

85 50 95 

acg ate acg acg acc ctg ccc gcg gtc ttc gcc aat gtt gtc gcg caa 
Thr He Thr Thr Thr Leu Pro Ala Val Phe Ala Asn Val Val Ala Gin 



100 



105 HO 



ggc gac age aat tec ate ggc tgc cgc ate acc gtc aac ggt gta gtc 
Gly Asp Ser Asn Ser He Gly Cys Arg He Thr Val Asn Gly Val Val 
115 120 125 

aag gac gaa agg ate gtc aac gaa gtg cgc gcc tat acc ttc tgc etc 
Lys Asp Glu Arg He Val Asn Glu Val Arg Ala Tyr Thr Phe Cys Leu 
130 135 140 

gac aag tec tea tga 

Asp Lys Ser Ser ' 

145 



96 



144 



192 



240 



288 



336 



84 



432 



447 



<210> 10 
<211> 148 
<212> PRT 

<213> Mycobacterium complex 
<220> 

<223> mmpS6 protein 



Vai°Gln°Gly He Ser Val Thr Gly Leu Val Lys Arg Gly Trp Met Val 
15 10 15 

Leu Val Ala Val Ala Val Val Ala Val Ala Gly Phe Ser Val Tyr Arg 



2 



0 



25 30 



Leu His Gly He Phe Gly Ser His Asp Thr Thr Ser Thr Ala Gly Gly 

35 40 45 

Val Ala Asn Asp He Lys Pro Phe Asn Pro Lys Gin Val Thr Leu Glu 

50 55 60 
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Val Phe Gly Ala 
65 

Asp Ala Thr Pro 



Thr He Thr Thr 

100 

Gly Asp Ser Asn 
115 

Lys Asp Glu Arg 
130 

Asp Lys Ser Ser 
145 



Pro Gly Thr Val 
70 

Arg Gin Val Leu 
85 

Thr Leu Pro Ala 



Ser He Gly Cys 

120 

He Val Asn Glu 
135 



Ala Thr He Asn 

75 

Asp Thr Thr Leu 
90 

Val Phe Ala Asn 
105 

Arg He Thr Val 



Val Arg Ala Tyr 

14 0 



Tyr Leu Asp Val 

80 

Pro Trp Ser Tyr 

95 

Val Val Ala Gin 
110 

Asn Gly Val Val 
125 

Thr Phe Cys Leu 



<210> 11 
<211> 399 
<212> DNA 

<213> Mycobacterium complex 

<220> 

<221> CDS 

<222> (1) . . (399) 

<220> 

<223> mmpS6 truncated coding sequence and protein 
<400> 11 

ctg gtt gcc gtg gcg gtg gtg gcg gtc gcg gga ttc age gtc tat egg 48 
Leu Val Ala Val Ala Val Val Ala Val Ala Gly Phe Ser Val Tyr Arg 
15 10 15 

ttg cac ggc ate ttc ggc teg cac gac acc acc teg acc gcc ggt ggt 96 
Leu His Gly He Phe Gly Ser His Asp Thr Thr Ser Thr Ala Gly Gly 

20 25 30 

gtc gcg aac gac ate aag ccg ttc aac ccc aaa cag gta acc etc gag 144 
Val Ala Asn Asp He Lys Pro Phe Asn Pro Lys Gin Val Thr Leu Glu 

35 40 45 

gtc ttt ggc get ccc gga acc gtg gca acg ate aat tat ctg gac gtg 192 
Val Phe Gly Ala Pro Gly Thr Val Ala Thr He Asn Tyr Leu Asp Val 
50 55 60 

gat gcc aca cct egg caa gtc ctg gac acg acc ctg ccg tgg tea tac 240 
Asp Ala Thr Pro Arg Gin Val Leu Asp Thr Thr Leu Pro Trp Ser Tyr 
65 70 75 80 

acg, ate acg acg acc ctg ccc gcg gtc ttc gcc aat gtt gtc gcg caa 288 
Thr He Thr Thr Thr Leu Pro Ala Val Phe Ala Asn Val Val Ala Gin 

85 90 95 

ggc gac age aat tec ate ggc tgc cgc ate acc gtc aac ggt gta gtc 33 6 
Gly Asp Ser Asn Ser He Gly Cys Arg He Thr Val Asn Gly Val Val 

100 105 110 
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aag gac gaa agg ate gtc aac gaa gtg cgc gec tat acc ttc tgc etc 3 84 
Lys Asp Glu Arg He Val Asn Glu Val Arg Ala Tyr Thr Phe Cys Leu 
115 120 125 

3 99 

gac aag tec tea tga 
Asp Lys Ser Ser 
130 



<210> 12 
<211> 132 
<212> PRT 

<213> Mycobacterium complex 
<220> 

<223> mmpSb truncated protein 



<400> 12 

Leu Val Ala Val Ala Val Val Ala Val Ala Gly Phe Ser Val Tyr Arg 
15 10 15 

Leu His Gly He Phe Gly Ser His Asp Thr Thr Ser Thr Ala Gly Gly 

20 25 30 

Val Ala Asn Asp He Lys Pro Phe Asn Pro Lys Gin Val Thr Leu Glu 

35 40 45 

Val Phe Gly Ala Pro Gly Thr Val Ala Thr He Asn Tyr Leu Asp Val 
50 55 60 



Asp 



Ala Thr Pro Arg Gin Val Leu Asp Thr Thr Leu Pro Trp Ser Tyr 



65 70 



75 80 



Thr He Thr Thr Thr Leu Pro Ala Val Phe Ala Asn Val Val Ala Gin 

85 9° 95 

Gly Asp Ser Asn Ser He Gl* Cys Arg He Thr Val Asn Gly Val Val 

100 105 HO 

Lys Asp Glu Arg He Val Asn Glu Val Arg Ala Tyr Thr Phe Cys Leu 
115 120 125 



Asp Lys Ser Ser 
130 



<210> 13 
<211> 20 
<212> DNA 

<213> Mycobacterium complex 
<400> 13 

cgttcaaccc caaacaggta 



<210> 14 
<211> 20 
<212> DNA 

<213> Mycobacterium complex 
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<400> 14 

aatcgaactc gtggaacacc 2 0 

<210> 15 
<211> 20 
<212> DNA 

<213> Mycobacterium complex 
<400> 15 

attcagcgtc tatcggttgc 20 

<210> 16 - 
<211> 20 
<212> DNA 

<213> Mycobacterium complex 
<400> 16 

agcagctcgg gatatcgtag 2 0 

<210> 17 
<211> 20 
<212> DNA 

<213> Mycobacterium complex 



<400> 17 

ctacctcatc ttccggtcca 



<210> 18 

<211> 20 

<212> DNA 

<213> Mycobacterium complex 



20 



<400> 18 

catagatccc ggacatggtg 2 0 



<210> 19 

<211> 2390 

<212> DNA 

<213> Mycobacterium canettii 

<220> 
<221> CDS 

<222> (517) . . (2307) 
<400> 19 

gatcccgtcg ccgcggcgct ggagctggcc gccgggcccg cagccgcccc gcgcgaggtc 60 
gtgctggcga gcaaagccac catgcgcgcc acagccagcc ccggatcgct ggaccttgag 12 0 
caacacgaac tcgccaaacg cttagaactt gggccgcagg cgaaatcggt ccagtcgccc 180 
gagttcgccg ctcgcttggc tgccgctcaa cacaggtagc gcctaccagc ctcgctggtt 240 
tccatggcgt gccccagtcc gaagctgctg ctgcttgact ccgcgcgctg ggcccgagcg 300 
cgcgctgttg tacggcccaa acggcgtgtc ggtgtacagt cgcgcgctcg cggcttcagt 3 60 
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ccggcccccc gactccggca ggcccgacgg cgcccagcgc tagcccgaag ttcccccttg 420 

taggggcggg ctgagtttcg atctgtttcg tgagcaggtg tttctgtgtt caacttccct 480 

caacatgtac tcatgtatta ttgagaatag ctcggc gtg tea tec tct gat gac 534 

Val Ser Ser Ser Asp Asp 
1 5 

get att ate gcg ctg acc gcg tgt tat aaa gta ate atg tac att ace 582 

Ala lie He Ala Leu Thr Ala Cys Tyr Lys Val He Met Tyr He Thr 

10 15 20 

egg gta ccc aac egg gga tec ccg ccg gcg gtg ctg ttg egg gaa age 63 0 
Arg Val Pro Asn Arg Gly Ser Pro Pro Ala Val Leu Leu Arg Glu Ser 

25 30 35 

ttc cgc gaa aac ggc aag gtc aag acg cgt acc ctg gee aac etc tea 67 8 
Phe Arg Glu Asn Gly Lys Val Lys Thr Arg Thr Leu Ala Asn Leu Ser 
40 45 50 

cgc tgg ccc gag cac aag ctg gac aga ctg gac egg gcg ctt aag ggc 726 
Arg Trp Pro Glu His Lys Leu Asp Arg Leu Asp Arg Ala Leu Lys Gly 
55 60 65 70 

ttg ccg ccc gcg gac tgg gat eta gee gag gee ttc gat ate acc cgc 774 
Leu Pro Pro Ala Asp Trp Asp Leu Ala Glu Ala Phe Asp He Thr Arg 

75 80 85 

age ctg ccg cac ggg cat gtg gee gcg gtg gee ggc acc gee gag aag 822 
Ser Leu Pro His Gly His Val Ala Ala Val Ala Gly Thr Ala Glu Lys 

90 95 100 

ctg ggc ata ccc gag ctg ate gac ccc acc ccg teg egg egg cgc aac 870 
Leu Gly He Pro Glu Leu He Asp Pro Thr Pro Ser Arg Arg Arg Asn 
105 HO US 

ctg gtg ctg gec atg ctg ate ggg cag ate ate gag ccc gga teg aaa 918 
Leu Val Leu Ala Met Leu He Gly Gin lie -lie Glu Pro Gly Ser Lys 
120 125 130 

ctg gcg ate gcg cgc ggg ctg cgc gee cag acc gee acc age acg ctg 966 
Leu Ala He Ala Arg Gly Leu Arg Ala Gin Thr Ala Thr Ser Thr Leu 

140 145 150 



135 



ggt gcg gtg ctg ggt gtc teg ggc gee gat gag gac gac ctg tat gac 
Gly Ala Val Leu Gly Val Ser Gly Ala Asp Glu Asp Asp Leu Tyr Asp 

1 -, r-rt n ere: 



155 160 165 



1014 



gcg atg gac tgg gcg ctg gag cgc aaa gac ggc ate gaa aac gee ttg 1062 
Ala Met Asp Trp Ala Leu Glu Arg Lys Asp Gly He Glu Asn Ala Leu 

170 175 180 

gee gca egg cat ctg acc aac ggc acc ctg gtg etc tat gac gta tec 1110 
Ala Ala Arg His Leu Thr Asn Gly Thr Leu Val Leu Tyr Asp Val Ser 
185 190 195 

teg gcg gcg ttc gag ggc cac acc tgc ccg ctg gga gcg ate ggg cac 115 8 
Ser Ala Ala Phe Glu Gly His Thr Cys Pro Leu Gly Ala He Gly His 
200 205 210 
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gcc cgc gac ggg gtc aaa ggc egg ctg cag ate gtc tac ggg ctg ctg 12 06 
Ala Arg Asp Gly Val Lys Gly Arg Leu Gin lie Val Tyr Gly Leu Leu 
215 220 225 230 

tgc tea ccc aag gga gcg ccg gtg gcc ate gag gtg ttc aag ggc aac 1254 
Cys Ser Pro Lys Gly Ala Pro Val Ala lie Glu Val Phe Lys Gly Asn 

235 240 245 

acc gcc gac ccg aaa act ctg aaa get caa ate gac aag etc aaa ace 1302 
Thr Ala Asp Pro Lys Thr Leu Lys Ala Gin lie Asp Lys Leu Lys Thr 

250 255 260 

egg ttc ggg ttg acc cgc ate gcc ctg gtg ggc gat egg ggc atg etc 1350 
Arg Phe Gly Leu Thr Arg lie Ala Leu Val Gly Asp Arg Gly Met Leu 
265 270 275 

act tec gcg cgc ate cgt gac gag ctg cgt ccg gcg cac ctg gat tgg 1398 
Thr Ser Ala Arg lie Arg Asp Glu Leu Arg Pro Ala His Leu Asp Trp 
280 285 290 



ate age gcg ctg cgc gcc ccg cag ate aag ate ctg etc gag gac ggg 

He Ser Ala Leu Arg Ala Pro Gin He Lys He Leu Leu Glu Asp Gly 

295 30.0 305 310 

gcg ctg cag ctg teg ctg ttc gat gag cag aac ctg ttc gag ate act 

Ala Leu Gin Leu Ser Leu Phe Asp Glu Gin Asn Leu Phe Glu He Thr 

315 320 325 



1446 



1494 



cac ccc gac tat ccc ggt gag egg ctg gtg tgc tgc cac aac ccc gcc 1542 
His Pro Asp Tyr Pro Gly Glu Arg Leu Val Cys Cys His Asn Pro Ala 

330 335 340 



ctg gcc gac gag cgc gcc cgc aaa cgc gcc gag ctg ctg gcg gcc acc 
Leu Ala Asp Glu Arg Ala Arg Lys Arg Ala Glu Leu Leu Ala Ala Thr 
345 350 355 

gaa aag gag ctg cag gcc ate gcc gaa gcc acc cgc cgc caa cgc egg 
Glu Lys Glu Leu Gin Ala He Ala Glu Ala Thr Arg Arg Gin Arg Arg 
360 365 370 



1590 



1638 



ccg tta cgc ggt aca gac aag ate ggc ctg egg gtg ggc aag gtg cgc 1686 

Pro Leu Arg Gly Thr Asp Lys He Gly Leu Arg Val Gly Lys Val Arg 
375 380 385 390 

aac aag ttc aag atg gcc aag cac ttt gac ctg cac ate acc gat gag 1734 

Asn Lys Phe Lys Met Ala Lys His Phe Asp Leu His He Thr Asp Glu 

rtac 400 405 



395 400 

gcc ttc age ttc acc cgc aac cag aac agt ate gcc gcc gag gcc gcc 
Ala Phe Ser Phe Thr Arg Asn Gin Asn -Ser He Ala Ala Glu Ala Ala 

410 415 420 

etc gac ggc ate tac gtg eta cgc acc age ctg ccc gac aac gcc ctg 
Leu Asp Gly He Tyr Val Leu Arg Thr Ser Leu Pro Asp Asn Ala Leu 
425 430 435 

ggc cgc gac gac gtg gtg ggc cgc tac aaa gac etc gcc gac gtc gaa 
Gly Arg Asp Asp Val Val Gly Arg Tyr Lys Asp Leu Ala Asp Val Glu 
440 445 450 

cgc ttc ttc cgc acc etc aac age gaa ctg gac gta cgc ccc ate egg 



1782 



1830 



1878 



1926 
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2070 



2118 



Arg Phe Phe Arg Thr Leu Asn Ser Glu Leu Asp Val Arg Pro lie Arg 
455 460 465 470 

cat egg ctg gec gac egg gtc cgc gec cac atg ttc ttg cac atg etc 1974 
His Arg Leu Ala Asp Arg Val Arg Ala His Met Phe Leu His Met Leu 

475 480 485 

tec tac tac ate age tgg cac atg aaa caa gee ctg gee cca ate ctg 2022 
Ser Tyr Tyr He Ser Trp His Met Lys Gin Ala Leu Ala Pro He Leu 

490 495 500 

ttc ace gac aac gac aaa ccc gec gec gee gee aaa cgc gec gac ccc 
Phe Thr Asp Asn Asp Lys Pro Ala Ala Ala Ala Lys Arg Ala Asp Pro 
505 510 515 

gtc gcg cca gee caa cgc tec gac gaa gcg ctg aac aag gca gca cgc 
Val Ala Pro Ala Gin Arg Ser Asp Glu Ala Leu Asn Lys Ala Ala Arg 
520 525 530 

aaa cgc ace gaa gac aac caa ccg gtg cac age ttc acc age ctg etc 
Lvs Aro Thr Glu Asp Asn Gin Pro Val His Ser Phe Thr Ser Leu Leu 
535 540 545 550 

acc gac ctg gee acc ate tgc gec aac tac ate caa ccc aca gac gac 
Thr Asp Leu Ala Thr He Cys Ala Asn Tyr He Gin Pro Thr Asp Asp 

555 560 565 

— g cca gca ttc acc aaa acc acc acc ccc acc ccc aca caa egg cgc 
I,eu Pro Ala Phe Thr Lys Thr Thr Thr Pro Thr Pro Thr Gin Arg Arg 

570 575 580 

gee ttc gac eta ctg gee gtt tec cac cgc cac ggc ctg gcg tag 
Ala Phe Asp Leu Leu Ala Val Ser His Arg His Gly Leu Ala 
585 590 595 

tcagtaccga accacaaatg cccaggtcaa cgacacaaac cgcgccggat cagggggaac 2367 

2390 

ttegggctag ccgggcgcgc egg 



;166 



214 



2262 



2307 



<210> 20 
<211> 596 
<212> PRT 

<213> Mycobacterium 
<400> 20 

Val Ser Ser Ser Asp 
1 5 

Val He Met Tyr He 

20 

Val Leu Leu Arg Glu 

35 

Thr Leu Ala Asn Leu 
50 

Asp Arg Ala Leu Lys 
65 



canettii 

Asp Ala He He Ala Leu 

10 

Thr Arg Val Pro Asn Arg 

25 

Ser Phe Arg Glu Asn Gly 

40 

Ser Arg Trp Pro Glu His 
55 

Gly Leu Pro Pro Ala Asp 
70 75 



Thr Ala Cys Tyr Lys 

15 

Gly Ser Pro Pro Ala 

30 

Lys Val Lys Thr Arg 
45 

Lys Leu Asp Arg Leu 
60 

Trp Asp Leu Ala Glu 

8 0 
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Ala Phe Asp lie Thr Arg Ser Leu Pro His Gly His Val Ala Ala Val 

85 90 95 

Ala Gly Thr Ala Glu Lys Leu Gly lie Pro Glu Leu lie Asp Pro Thr 

100 105 110 

Pro Ser Arg Arg Arg Asn Leu Val Leu Ala Met Leu lie Gly Gin lie 
115 120 125 

lie Glu Pro Gly Ser Lys Leu Ala lie Ala Arg Gly Leu Arg Ala Gin 
130 135 140 

Thr Ala Thr Ser Thr Leu Gly Ala Val Leu Gly Val Ser Gly Ala Asp 
145 150 155 160 

Glu Asp Asp Leu Tyr Asp Ala Met Asp Trp Ala Leu Glu Arg Lys Asp 

165 170 175 

Gly lie Glu Asn Ala Leu Ala Ala Arg His Leu Thr Asn Gly Thr Leu 

180 185 190 

Val Leu Tyr Asp Val Ser Ser Ala Ala Phe Glu Gly His Thr Cys Pro 
195 200 205 

Leu Gly Ala lie Gly His Ala Arg Asp Gly Val Lys Gly Arg Leu Gin 
210 215 220 

lie Val Tyr Gly Leu Leu Cys Ser Pro Lys Gly Ala Pro Val Ala lie 
225 230 235 240 

Glu Val Phe Lys Gly Asn Thr Ala Asp Pro Lys Thr Leu Lys Ala Gin 

245 250 255 

lie Asp Lys Leu Lys Thr Arg Phe Gly Leu Thr Arg lie Ala Leu Val 

260 265 270 

Gly Asp Arg Gly Met Leu Thr Ser Ala Arg lie Arg Asp Glu Leu Arg 
275 280 285 

Pro Ala His Leu Asp Trp He Ser Ala Leu Arg Ala Pro Gin He Lys 
290 295 300 

He Leu Leu Glu Asp Gly Ala Leu Gin Leu Ser Leu Phe Asp Glu Gin 
305 310 "315 320 

Asn Leu Phe Glu He Thr His Pro Asp Tyr Pro Gly Glu Arg Leu Val 

325 330 335 

Cys Cys His Asn Pro Ala Leu Ala Asp Glu Arg Ala Arg Lys Arg Ala 

340 345 350 

Glu Leu Leu Ala Ala Thr Glu Lys Glu Leu Gin Ala He Ala Glu Ala 
355 360 365 

Thr Arg Arg Gin Arg Arg Pro Leu Arg Gly Thr Asp Lys He Gly Leu 
370 375 380 

Arg Val Gly Lys Val Arg Asn Lys Phe Lys Met Ala Lys His Phe Asp 
385 390 395 400 

Leu His He Thr Asp Glu Ala Phe Ser Phe Thr Arg Asn Gin Asn Ser 
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405 



410 415 



He Ala Ala Glu Ala Ala Leu Asp Gly He Tyr Val Leu Arg Thr Ser 

420 425 430 

Leu Pro Asp Asn Ala Leu Gly Arg Asp Asp Val Val Gly Arg Tyr Lys 
435 440 445 

Asp Leu Ala Asp Val Glu Arg Phe Phe Arg Thr Leu Asn Ser Glu Leu 
450 455 460 

Asp Val Arg Pro He Arg His Arg Leu Ala Asp Arg Val Arg Ala His 
465 470 475 480 

Met Phe Leu His Met Leu Ser Tyr Tyr He Ser Trp His Met Lys Gin 

485 490 495 

Ala Leu Ala Pro He Leu Phe Thr Asp Asn Asp Lys Pro Ala Ala Ala 

500 505 510 

Ala Lys Arg Ala Asp Pro Val Ala Pro Ala Gin Arg" Ser Asp Glu Ala 
515 520 525 

Leu Asn Lys Ala Ala Arg Lys Arg Thr Glu Asp Asn Gin Pro Val His 
530 535 540 

Ser Phe Thr Ser Leu Leu Thr Asp Leu Ala Thr He Cys Ala Asn Tyr 
545 550 555 560 

"le Gin Pro Thr Asp Asp Leu Pro Ala Phe Thr Lys Thr Thr Thr Pro 

565 570 575 

Thr Pro Thr Gin Arg Arg Ala Phe Asp Leu Leu Ala Val Ser His Arg 

580 585 590 



His Gly Leu Ala 
595 



<210> 21 
<211> 1191 
<212> DNA 

<213> Mycobacterium tuberculosis 



<220> 

<221> CDS 

<222> (1) . . (1191) 

<223> Fusion gene between mmpSG and mmpL6 genes 



<220> 

<221> misc_f eature 
<222> (1) (1191) 

<223> CDS corresponds to fusion protein of rearranged forms 
of mmpS6 and mmpL6 



<400> 1 

gtg cag ggg att tea gtg act ggc 

Val Gin Gly He Ser Val Thr Gly 

1 5 

aga tec gtc ttt gac acg ate gac 



ctg gtc aaa cgc ggc tgg atg gtg 
Leu Val Lys Arg Gly Trp Met Val 
10 15 

ggt ate gac caa etc ggc gag cag 
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Arg Ser Val Phe Asp Thr lie Asp Gly lie Asp Gin Leu Gly Glu Gin 

20 25 30 

ctg gcc age gtg acc gta acc ttg gac aag ttg get gcg ate cag cct 144 
Leu Ala Ser Val Thr Val Thr Leu Asp Lys Leu- Ala Ala lie Gin Pro 

35 40 45 

caa ttg gtg gcg ctg eta cca- gac gag ate gcc age cag cag ate aat 192 
Gin Leu Val Ala Leu Leu Pro Asp Glu lie Ala Ser Gin Gin lie Asn 
50 55 60 

egg gaa ctg gcg ctg get. aac tac gcc acc atg tec ggg ate tat gcc 240 

Arg Glu Leu Ala Leu Ala Asn Tyr Ala Thr Met Ser Gly lie Tyr Ala 
65 70 75 80 

cag acg gcg gcc ttg ate gaa aac get gcc gcc atg gga caa gcc ttt 288 
Gin Thr Ala Ala Leu lie Glu Asn Ala Ala Ala Met Gly Gin Ala Phe 

85 90 95 

gac gcc gcc aag aac gac gac tec ttc tat ctg ccg. ccg gag get ttt 33 6 
Asp Ala Ala Lys Asn Asp Asp Ser Phe Tyr Leu Pro Pro Glu Ala Phe 

100 105 110 

gac aac cca gat ttc cag cgc ggc ctg aaa ttg ttc ctg teg gca gac 384 
Asp Asn Pro Asp Phe Gin Arg Gly Leu Lys Leu Phe Leu Ser Ala Asp 
115 120 125 

ggt aag gcg get egg atg ate ate tec cat .gaa ggc gat ccc gcc acc 432 
Gly Lys Ala Ala Arg Met He He Ser His Glu Gly Asp Pro Ala Thr 
130 135 140 

ccc gaa ggc att teg cat ate gac gcg ate aag cag gcg gcc cac gag 480 
Pro Glu Gly He Ser His He Asp Ala He Lys Gin Ala Ala His Glu 
145 150 155 160 

gcc gtg aag ggc act ccc atcf gcg ggt get ggg ate tat ctg gcc ggc 52 8 
Ala Val Lys Gly Thr Pro Met Ala Gly Ala Gly He Tyr Leu Ala Gly 

165 170 175 

acg gcc gcc acc ttc aag gac att caa gac ggc gcc acc tac gac etc 576 
Thr Ala Ala Thr Phe Lys Asp He Gin Asp Gly Ala Thr Tyr Asp Leu 

180 . 185 190 

ctg ate gcc gga ata gcc gcg ctg age ttg att ttg etc ate atg atg 624 
Leu He Ala Gly He Ala Ala Leu Ser Leu He Leu Leu He Met Met 
195 200 205 

ate att acc cga age ctg gtt gcg gcg ctg gtg ate gtg ggc acg gtg 672 
He He Thr Arg Ser Leu Val Ala Ala Leu Val He Val Gly Thr Val 
210 215 220 



gcg ctg teg ttg ggc get tct ttt ggc ctg tec gtg ctg gtg tgg cag 
Ala Leu Ser Leu Gly Ala Ser Phe Gly Leu Ser Val Leu Val Trp Gin 
225 230 235 240 

cat ctt etc ggt ate cag ttg tac tgg ate gtg etc gcg ctg gcc gtc 
His Leu Leu Gly He Gin Leu Tyr Trp He Val Leu Ala Leu Ala Val 

245 250 255 



720 



768 



ate ctg etc ctg gcc gtg gga teg gac tat aac ttg ctg ctg att tec 816 
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He Leu Leu Leu Ala Val Gly Ser Asp Tyr Asn Leu Leu Leu He Ser 

260 265 270 



cga ttc aag gag gag ate ggt gca ggt ttg aac acc ggc ate ate cgt 
Arg Phe Lys Glu Glu He Gly Ala Gly Leu Asn Thr Gly He He Arg 
275 280 285 



<210> 22 
<211> 397 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> Fusion protein of rearranged forms of mmpS6 and mmpL6 
<400> 2 

Val Gin Gly He Ser Val Thr Gly Leu Val Lys Arg Gly Trp Met Val 
! 5 10 15 

Arg Ser Val Phe Asp Thr He Asp Gly He Asp Gin Leu Gly Glu Gin 

20 25 30 

Leu Ala Ser Val Thr Val Thr Leu Asp Lys Leu Ala Ala He Gin Pro 

35 40 45 

Gin Leu Val Ala Leu Leu Pro Asp Glu He Ala Ser Gin Gin He Asn 
50 55 60 

Arq Glu Leu Ala Leu Ala Asn Tyr Ala Thr Met Ser Gly He Tyr Ala 
6 i 70 75 80 



864 



1008 



1056 



gcg atg gec ggc acc ggc ggg gtg gtg acc get gee ggc ctg gtg ttc 912 
Ala Met Ala Gly Thr Gly Gly Val Val Thr Ala Ala Gly Leu Val Phe 
290 295 300 

gee gee act atg tct teg ttc gtg ttc agt gat ttg egg gtc etc ggt 960 
Ala Ala Thr Met Ser Ser Phe Val Phe Ser Asp Leu Arg Val Leu Gly 
305 310 315 320 

cag ate ggg acc acc att ggt ctt ggg ctg ctg ttc gac acg ctg gtg 
Gin He Gly Thr Thr He Gly Leu Gly Leu Leu Phe Asp Thr Leu Val 

325 330 335 

gtg cgc gcg ttc atg acc ccg tec ate gcg gtg ctg etc ggg cgc tgg 
Val Arg Ala Phe Met Thr Pro Ser He Ala Val Leu Leu Gly Arg Trp 

340 345 350 

ttc tgg tgg ccg caa cga gtg cgc ccg cgc cct gee age agg atg ctt 
Phe Trp Trp Pro Gin Arg Val Arg Pro Arg Pro Ala Ser Arg Met Leu 
355 360 365 

egg ccg tac ggc ccg egg ccc gtg gtt cgt gaa ttg ctg ctg cgc gag 
Arg Pro Tyr Gly Pro Arg Pro Val Val Arg Glu Leu Leu Leu Arg Glu 
370 375 380 

ggc aac gat gac ccg aga act cag gtg get acc cac cgt 1191 
Gly Asn Asp Asp Pro Arg Thr Gin Val Ala Thr His Arg 
385 390 395 



1104 



1152 
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Gin Thr Ala Ala Leu lie Glu Asn Ala Ala Ala Met Gly Gin Ala Phe 

85 90 95 

Asp Ala Ala Lys Asn Asp Asp Ser Phe Tyr Leu Pro Pro Glu Ala Phe 

100 105 110 

Asp Asn Pro Asp Phe Gin Arg- Gly Leu Lys Leu Phe Leu Ser Ala Asp 
115 120 125 

Gly Lys Ala Ala Arg Met lie lie Ser "His Glu Gly Asp Pro Ala Thr 
130 135 140 

Pro Glu Gly lie Ser His lie Asp Ala lie Lys Gin Ala Ala His Glu 
145 150 155 160 

Ala Val Lys Gly Thr Pro Met Ala Gly Ala Gly lie Tyr Leu Ala Gly 

165 170 175 

Thr Ala Ala Thr Phe Lys Asp lie Gin Asp Gly Ala Thr Tyr Asp Leu 

180 185 190 

Leu lie Ala Gly He Ala Ala Leu Ser Leu He Leu Leu He Met Met 
195 200 205 

He He Thr Arg Ser Leu Val Ala Ala Leu Val He Val Gly Thr Val 
210 215 220 

Ala Leu Ser Leu Gly Ala Ser Phe Gly Leu Ser Val Leu Val Trp Gin 
225 230 235 240 

His Leu Leu Gly He Gin Leu Tyr Trp lie Val Leu Ala Leu Ala Val 

245 250 255 

He Leu Leu Leu Ala Val Gly Ser Asp Tyr Asn Leu Leu Leu He Ser 

260 265 270 

Arg Phe Lys Glu Glu He Gly Ala Gly Leu Asn Thr Gly He He Arg 
\ 275 280 285 

Ala Met Ala Gly Thr Gly Gly Val Val Thr Ala Ala Gly Leu Val Phe 

.. 290 295 300 

i 

Aid Ala Thr Met Ser Ser Phe Val Phe Ser Asp Leu Arg Val Leu Gly 
305 310 315 320 

Gin .He Gly Thr Thr He Gly Leu Gly Leu Leu Phe Asp Thr Leu Val 

325 330 335 

Val Arg Ala Phe Met Thr Pro Ser He Ala Val Leu Leu Gly Arg Trp 

340 345 350 

Phe Trp Trp Pro Gin Arg Val Arg Pro Arg Pro Ala Ser Arg Met Leu 
355 360 365 

Arg Pro Tyr Gly Pro Arg Pro Val Val Arg Glu Leu Leu Leu Arg Glu 
370. 375 380 

Gly Asn Asp Asd Pro Arg Thr Gin Val Ala Thr His Arg 
385 390 395 
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537-544, XP002087941 
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figure 1 

table 1 
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COLE S.T. ET AL.: "Mycobacterium 
tuberculosis H37Rv .complete genome- 
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-T later document published after the international filing date 
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•X* document of particular relevance; the claimed invention 
cannot be considered novel or cannot be considered to 
Involve an inventive step when the document is taken alone 

■V document of particular relevance; the claimed invention 

cannot be considered to Involve an inventive step when the 
document is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
in the art. 
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Database accession no. Z74020 
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the whole document 
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STEWART (FR); GARNIER THIERRY (FR); GORDON 
ST) 21 September 2000 (2000-09-21) 
page 5, line 9 -page 17, line 21 
page 22, line 15 -page 32, line 3 

figure ID 

tables 1-3 

claims 3,6,14 

US 6 291 190 Bl (BEHR MARCEL ET AL) 
18 September 2001 (2001-09-18) 
column 11, line 66 -column 18, line 58 
table 1 
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1274-1282, XP000647583 
ISSN: 0021-9193 

cited in the application 
figure 2 

GORDON S V ET AL: "IDENTIFICATION OF 

VARIABLE REGIONS IN THE GENOMES OF 

TUBERCLE BACILI USING BACTERIAL ARTIFICIAL 

CHROMOSOME ARRAYS" 

MOLECULAR MICROBIOLOGY, BLACKWELL 

SCIENTIFIC, OXFORD, GB, 

vol. 32, no. 3, May 1999 (1999-05), pages 

643-655, XP000933429 

ISSN: 0950-382X 

cited in the application 
tables 1-3 
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15-19, 
21,29, 
30,42, 
51-53 
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46-52, 
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55,58,59 



46-52, 
55,58,59 



Fonn PCT/lSA/210 (continuation ot second sheet) (July 1992) 



page 2 of 3 



BNSDOCID: <WO 0307098 1A3J_> 



(INTERNATIONAL SEARCH REPORT 



Internal ppllcatiort No 

PCT/IB 03/00986 



C(Contlnuatton) DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document, with indicaiion.where appropriate, of the relevant passages 



DATABASE TAXONOMY BROWSER 'Online! 
NCBI; 

Host: http://www.ncbi.nih.gov, 
"Mycobacterium tuberculosis complex 
XP002206354 

http ': //www .nb1.nlm.n1h. gov/Taxonomy/Browse 
r/wwwt ax . eg i ? i d=77643 
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the whole document 

SREEVATSAN SRINAND ET AL: "Restricted 
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SCIENCES OF THE UNITED STATES, 

vol. 94, no. 18, 1997, pages 9869-9874, 

XP002206250 

1997 

ISSN: 0027-8424 

page 9870, left-hand column 

table 1 

figure 1 

BR0SCH R ET AL: "A new evolutionary 

scenario for the Mycobacterium 

tuberculosis complex." 

PROCEEDINGS OF THE NATIONAL ACADEMY OF 

SCIENCES OF THE UNITED STATES, 

vol . 99, no. 6, 

19 March 2002 (2002-03-19), pages 
3684-3689, XP002206251 
http://www.pnas.org March 19, 2002 
ISSN: 0027-8424 
the whole document 
-& DATABASE GENBANK 'Online! 
NCBI; 16 March 2002 (2002-03-16) 
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retrieved from 
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the whole document 
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Box I Observations where certain claims were found unsearchable (Continuation of item 1 of first sheet) 



This International Search Report has not been established in respect of certain claims under Article 17(2)(a) for the following reasons: 



1 . Claims Nos.: _ ;i 

because they relate to subject matter not required to be searched by this Authority, namely: 



■ mcbimsNos, 6,27-29,51,52,58-59 (partially) 

because they relate to parts of the International Application that do not oomply with the prescnbed requirements to such 

an extent that no meaningful International Search can be carried out, specifically: 

see FURTHER INFORMATION sheet PCT/ISA/210 



3. Claims Nos.: j _ , . _ . 
because they are dependent claims and are not drafted In accordance with the second and third sentences of Rule 6.4(a). 



Box II Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 



This International Searching Authority found multiple inventions in this international application, as follows: 



1 . I I As all required additional search fees were timely paid by the applicant, this International Search Report covers all 
' 1 searchable claims. 



2. | | as all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite payment 
of any additional fee. 



3. I I As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
I 1 covers only those claims for which fees were paid, specifically claims Nos.: 



4. No required additional search fees were timely paid by the applicant. Consequently, this International Search Report is 
restricted to the invention first mentioned in the claims; it is covered by ciaims Nos.: 



Remark on Protest 



j | The additional search fees were accompanied by the applicant's protest 
j j No protest accompanied the payment of additional search fees. 
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Continuation of Box 1.2 

Claims Nos.: 6,27-29,51,52,58-59 (partially) 

Present claims 6 27-29, 51, 52, 58 and 59 relate to products defined by 

reference to a desirable characteristic or property, namely 

- I fJSSent specifically deleted in certain M. tuberculosis strains 

-timers' defined by reference to claim 25 which relates to a method 
wherein primer? able of amplifying a genomic region harbouring the TbDl 

d - e p rUe?sT P ecrfic (C fo?1arfouf genetic markers (claim 51(b) and claim 52 

'Nucleotide sequences capable to hybridise with the genetic the RD1. 

RD a 'polypep^ ™ and ™>J ^ll tne 

marke?s yP ca P pabfe "'react with an antibody/immune serum raised against the 

same immunogenic molecules or fragments thereof (claim 59). 

The claims cover all products having this characteristic or Property, 
wSereas ?he application provides support within the meaning of Art 6 PCT 
anil is reDroducible within the meaning of Art. 5 PCT for only a very 
limiied number of such products. In the present case the claims so lack 

the above reasoning, the claims also lack clarity (Art. 6 PCT) An 

render a' meaningful search over the whole of the clamed scope 
impossibl e. 

3Ste as."Si- 4 irisa sss,. , 

^EercIlosis y strains, but present in other Mycobacteria of the 
Mycobacterium tuberculosis complex (claim 6) 

t u 0 cennenre<; rip-fined in claim 26 (claims cl-O) 
' tte prSTa'rs specific for RD4 and R09 as given by Table 1 (claim 

Seaeilid^^ 

are not defined, they were not searched at all. 

Additionally, claim 51 relates to an extremely large n ^f ab ^ s P ^ ible 
nroducts In fact claim 51 contains so many options, van aoies ana 
KSflble" permeations that a lack of clarity and conciseness within the 
mpanina of Art 6 PCT arises to such an extent as to render a meaningful 
Telrchof the claim Impossible. Said claim relates to any combination of 
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primers defined in claims 1-14, 17 and 18 with at least one primer pair 
specific for 24 different genetic markers. Moreover, the primers are 
defined in terms of the result to be achieved, namely by their 
specificity for the said 24 different genetic markers (supra) (Art. 6 
PCT). 

Consequently, the search has been carried out for those parts of the 
application which do appear to be clear and concise, namely a kit as 
defined by claim 52, wherein the primer pairs specific for RD4 and RD5 
are those given in Table 3. 



The nucleic acid referred to in claim 57 is defined by reference to claim 
53 which, however, does not relate to any nucleic acids. Claim 57 was 
thus interpreted as referring to claim 56 (Art. 6 PCT). 

The applicant's attention is drawn to the fact that claims, or parts of 
claims, relating to inventions in respect of which no international 
search report has been established need not be the subject of an 
international preliminary examination (Rule 66.1(e) PCT). The applicant 
is advised that the EPO policy when acting as an International 
Preliminary Examining Authority is normally not to carry out a 
preliminary exami nation on matter which has not been searched. This is 
the case irrespective of whether or not the claims are amended following 
receipt of the search report or during any Chapter II procedure. 
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